Lecture Notes in 
Computer Science 



1576 



S. Doaitse Swierstra (Ed.) 



Programming 
Languages 
and Systems 



8th European Symposium on Programming, ESOP’99 
Held as Part of the Joint European Conferences 
on Theory and Practice of Software, ETAPS’99 
Amsterdam, The Netherlands, March 1999 
Proceedings 





Springer 




Lecture Notes in Computer Science 1576 

Edited by G. Goos, J. Hartmanis and J. van Leeuwen 




Springer 

Berlin 

Heidelberg 

New York 

Barcelona 

Hong Kong 

London 

Milan 

Paris 

Singapore 

Tokyo 




S. Doaitse Swierstra (Ed.) 



Programming 
Languages 
and Systems 



8th European Symposium on Programming, 
ESOP’99 

Held as Part of the Joint European Conferenees 
on Theory and Practice of Software, ETAPS’99 
Amsterdam, The Netherlands, March 22-28, 1 999 
Proceedings 




Springer 




Series Editors 



Gerhard Goos, Karlsruhe University, Germany 
Juris Hartmanis, Cornell University, NY, USA 
Jan van Leeuwen, Utreeht University, The Netherlands 

Volume Editor 
S. Doaitse Swierstra 

Utreeht University, Department of Computer Science 
P.O. Box 80.089, 3508 TB Utrecht, The Netherlands 
E-mail; swierstra@cs.uu.nl 



Cataloging-in-Puhlication data applied for 



Die Deutsche Bihliothek - CIP-Einheitsaufnahme 

Programming languages and systems ; proceedings / 8th European Symposium on 
Programming, ESOP ’99, held as part of the Joint European Conferences on Theory 
and Practice of Software, ETAPS ’99, Amsterdam, The Netherlands, March 22 - 28, 
1999. S. Doaitse Swierstra (ed.). - Berlin ; Heidelherg ; New York ; Barcelona 
; Hong Kong ; London ; Milan ; Paris ; Singapore ; Tokyo ; Springer, 1999 
(Lecture notes in computer science ; Vol. 1576) 

ISBN 3-540-65699-5 



CR Subject Classification (1998); D.3, F.3, F.4, D.1-2, E.l 
ISSN 0302-9743 

ISBN 3-540-65699-5 Springer- Verlag Berlin Heidelberg New York 



This work is subject to copyright. All rights are reserved, whether the whole or part of the material is 
concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, 
reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication 
or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1 965, 
in its current version, and permission for use must always be obtained from Springer- Verlag. Violations are 
liable for prosecution under the German Copyright Law. 

© Springer-Verlag Berlin Heidelberg 1999 
Printed in Germany 

Typesetting: Camera-ready by author 

SPIN: 10703074 06/3142 - 5 4 3 2 1 0 Printed on acid-free paper 




Preface 



This is the second time that of ESOP has formed part of the ETAPS cluster of 
conferences, workshops, working group meetings and other associated activities. 
One of the results of colocating so many conferences is a reduction in the number 
of possibilities to submit a paper to a European conference and the increased 
competition between conferences that occurs when boundaries between individ- 
ual conferences have not yet become well established. This may have been the 
reason for the fact that only 44 submission were received this year. On the other 
hand we feel that the average quality of submissions has gone up, and thus the 
program committee was able to select 18 good papers, only one less than the 
year before. 

The program committee did not meet physically, and all discussion was done 
using a Web-driven data base system. Despite some mixed feelings there is an 
overall tendency to appreciate the extra time available for giving papers a sec- 
ond look and really going into comments made by other program committee 
members. 

I want to thank my fellow program committee members for the work they 
have put into the refereeing process and the valuable feedback they have given to 
authors. I want to thank the referees for their work and many detailed comments, 
and finally I want to thank everyone who has submitted a paper: without authors, 
no conference. 

Utrecht, January 1999 Doaitse Swierstra 

ESOP ’99 Chairman 
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Foreword 



ETAPS’99 is the second instance of the European Joint Conferences on Theory 
and Practice of Software. ETAPS is an annual federated conference that was 
established in 1998 by combining a number of existing and new conferences. 
This year it comprises five conferences (FOSSACS, EASE, ESOP, CC, TAG AS), 
four satellite workshops (CMOS, AS, WAGA, CoFI), seven invited lectures, two 
invited tutorials, and six contributed tutorials. 

The events that comprise ETAPS address various aspects of the system de- 
velopment process, including specification, design, implementation, analysis and 
improvement. The languages, methodologies and tools which support these ac- 
tivities are all well within its scope. Different blends of theory and practice are 
represented, with an inclination towards theory with a practical motivation on 
one hand and soundly-based practice on the other. Many of the issues involved 
in software design apply to systems in general, including hardware systems, and 
the emphasis on software is not intended to be exclusive. 

ETAPS is a loose confederation in which each event retains its own identity, 
with a separate programme committee and independent proceedings. Its format 
is open-ended, allowing it to grow and evolve as time goes by. Gontributed talks 
and system demonstrations are in synchronized parallel sessions, with invited 
lectures in plenary sessions. Two of the invited lectures are reserved for “unify- 
ing” talks on topics of interest to the whole range of ETAPS attendees. As an 
experiment, ETAPS’99 also includes two invited tutorials on topics of special 
interest. The aim of cramming all this activity into a single one- week meeting 
is to create a strong magnet for academic and industrial researchers working on 
topics within its scope, giving them the opportunity to learn about research in 
related areas, and thereby to foster new and existing links between work in areas 
that have hitherto been addressed in separate meetings. 

ETAPS’99 has been organized by Jan Bergstra of GWI and the University of 
Amsterdam together with Frans Snijders of GWI. Overall planning for ETAPS’99 
was the responsibility of the ETAPS Steering Gommittee, whose current mem- 
bership is: 

Andre Arnold (Bordeaux), Egidio Astesiano (Genoa), Jan Bergstra (Am- 
sterdam), Ed Brinksma (Enschede), Ranee Gleaveland (Stony 
Brook), Pierpaolo Degano (Pisa), Hartmut Ehrig (Berlin), Jose Fiadeiro 
(Lisbon), Jean-Pierre Finance (Nancy), Marie-Glaude Gaudel (Paris), 
Susanne Graf (Grenoble), Stefan Jahnichen (Berlin), Paul Klint (Ams- 
terdam), Kai Koskimies (Tampere), Tom Maibaum (London), Ugo 
Montanari (Pisa), Hanne Riis Nielson (Aarhus), Fernando Orejas 
(Barcelona), Don Sannella (Edinburgh), Gert Smolka (Saarbriicken), 
Doaitse Swierstra (Utrecht), Wolfgang Thomas (Aachen), Jerzy Tiuryn 
(Warsaw), David Watt (Glasgow) 
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Foreword 



ETAPS’98 has received generous sponsorship from: 

KPN Research 
Philips Research 

The EU programme “Training and Mobility of Researchers” 

CWI 

The University of Amsterdam 

The European Association for Programming Languages and Systems 
The European Association for Theoretical Computer Science 

I would like to express my sincere gratitude to all of these people and orga- 
nizations, the programme committee members of the ETAPS conferences, the 
organizers of the satellite events, the speakers themselves, and finally Springer- 
Verlag for agreeing to publish the ETAPS proceedings. 

Edinburgh, January 1999 Donald Sannella 

ETAPS Steering Committee Chairman 
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Functional Reactive Programming 



Paul Hudak 

Yale University, New Haven, CT 06518, USA 
paul . hudakSyale . edu 
WWW . cs . yale . edu/users/hudak .html 



Abstract. Functional reactive programming, or FRP, is a style of pro- 
gramming based on two key ideas: continuous time-varying behaviors, 
and event-based reactivity. FRP is the essence of Fran [1,2], a domain- 
specific language for functional reactive graphics and animation, and has 
recently been used in the design of Frob [3,4], a domain-specific language 
for functional vision and robotics. In general, FRP can be viewed as 
an interesting language for describing hybrid systems, which are systems 
comprised of both analog (continuous) and digital (discrete) subsystems. 
Continuous behaviors can be thought of simply as functions from time to 
some value: Behavior a = Time -> a. For example: an image behavior 
may represent an animation; a Cartesian-point behavior may be a mouse; 
a velocity-vector behavior may be the control vector for a robot; and a 
tuple-of-distances behavior may be the input from a robot’s sonar array. 
Both continuous behaviors and event-based reactivity have interesting 
properties worthy of independent study, but their integration is partic- 
ularly interesting. At the core of the issue is that events are intended 
to cause discrete shifts in declarative behavior; i.e. not just shifts in the 
state of reactivity. Being declarative, the natural desire is for everything 
to be first-class and higher-order. But this causes interesting clashes in 
frames of reference, especially when time and space transformations are 
applied. In this talk the fundamental ideas behind FRP are presented, 
along with a discussion of various issues in its formal semantics. 

This is joint work with Conal Elliot at Microsoft Research, and John 
Peterson at Yale. 



References 

1. Conal Elliott. Modeling interactive 3D and multimedia animation with an embedded 
language. In Proceedings of the first conference on Domain-Specific Languages, pages 
285-296. USENIX, October 1997. 

2. Conal Elliott and Paul Hudak. Functional reactive animation. In International 
Conference on Functional Programming, pages 163-173, June 1997. 

3. John Peterson, Paul Hudak, and Conal Elliott. Lambda in motion: Controlling 
robots with haskell. In First International Workshop on Practical Aspects of Declar- 
ative Languages. SIGPLAN, Jan 1999. 

4. A. Reid, J. Peterson, G. Hager, and P. Hudak. Prototyping real-time vision sys- 
tems: An experiment in DSL design. To appear Proc. Int. Conference on Software 
Engineering, May 1999. 



S.D. Swierstra (Ed.): ESOP/ETAPS’99, LNCS 1576, pp. 1—1, 1999. 
(c) Springer- Verlag Berlin Heidelberg 1999 




A Decidable Logic 

for Describing Linked Data Structures 
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Abstract. This paper aims to provide a better formalism for describing 
properties of linked data structures (e.g., lists, trees, graphs), as well as 
the intermediate states that arise when such structures are destructively 
updated. The paper defines a new logic that is suitable for these purposes 
(called Lr, for “logic of reachability expressions”). We show that Lr is 
decidable, and explain how Lr relates to two previously defined structure- 
description formalisms (“path matrices” and “static shape graphs”) by 
showing how an arbitrary shape descriptor from each of these formalisms 
can be translated into an Lr formula. 



1 Introduction 

This paper aims to provide a better formalism for describing properties of linked 
data structures (e.g., lists, trees, graphs). In past work with the same motivation, 
a variety of different formalisms have been developed — including “static shape 
graphs” [14,15,17,12,3,23,1,19,27,21,20,22], “path matrices” [9,11], “graph 
types” [16], and the ADDS annotation formalism [10] — and several previously 
known formalisms have been exploited — including graph grammars [6] and 
monadic second-order logic [13]. For lack of a better term, we will use the phrase 
structure- description formalisms to refer to such formalisms in a generic sense. 

In this paper, we define a new logic (called Lr, for “logic of reachability 
expressions”), and show that Lr is suitable for describing properties of linked 
data structures. We show that Lr is decidable. We also show in detail how 
Lr relates to two of the previously defined structure-description formalisms: In 
Section 3, we show how a generalization of Hendren’s path-matrix descriptors [9, 
11] can be represented by Lr formulae; in Section 4, we show how the variant 
of static shape graphs defined in [21] can be represented by Lr formulae. In this 
way, Lr provides insight into the expressive power of path matrices and static 
shape graphs. 

The benefits of our work include the following: 

— The logic Lr can be used as an annotation language to express loop invariants 
and pre- and post-conditions of statements and procedures. Annotations are 
important not only as a means of documenting programs, but also as the 
basis for analyzing and reasoning about programs in a modular fashion. Our 
work has two advantages: 
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• The logic Lr is quite expressive (e.g., strictly more expressive than the 
formalism used by Hendren et al. [10]). The added expressibility is im- 
portant for describing the intermediate states that arise when linked data 
structures are destructively updated. 

• The logic Lr is decidable, which means that there is an algorithm that 
determines, for every formula in the logic, if the formula is satisfiable. 
In other words, it is possible to determine if there is any store at all 
that satisfies a given formula. In principle, this ability can be used to 
provide some sanity checks on the formulae that a user employs — e.g., a 
warning can be issued if the user employs a formula that is unsatisfiable. 

Our work makes contributions on the question of extracting information 
from the results of program analysis. Although the subject of the paper is 
not primarily algorithms for analyzing programs that manipulate linked data 
structures, the decidability of Lr — together with the constructions given in 
Sections 3 and 4 for encoding other structure-description formalisms in Lr 
— has interesting consequences for extracting information from the results 
of program analyses: Lr provides a way to amplify the results obtained from 
known pointer-analysis, alias-analysis, and shape-analysis algorithms in the 
following ways: 

• For a structure-description formalism in which each structure descriptor 
corresponds to an Lr formula, as is the case for path matrices (Section 3) 
and static shape graphs (Section 4), it is possible to determine if there 
is any store at all that corresponds to a given structure descriptor. This 
lets us determine whether a given structure descriptor contains any useful 
information. 

• Pointer-analysis, alias-analysis, and shape-analysis algorithms necessar- 
ily compute structure descriptors that over-approximate the 
pointer/alias/shape relationships that actually arise. This kind of loss of 
precision is intrinsic to static-analysis; however, many of the techniques 
that have been proposed in the literature have the feature that additional 
imprecision crops up when information is extracted from the structure 
descriptor for a particular program point. For instance, with the three- 
valued logic used for shape analysis in [20,22], a formula that queries 
for a specific piece of information sometimes evaluates to “unknown”, 
even when, in all of the stores that the static shape graph represents, 
the formula evaluates to a definite true or false value. 

For a structure-description formalism in which each structure descrip- 
tor corresponds to an Lr formula, decidability gives us a mechanism for 
reading out information obtained by existing algorithms, without any ad- 
ditional loss of precision: If (p is the formula that represents the shape 
descriptor and if is the formula that represents the query, we are in- 
terested in whether ip always holds (or, equivalently, whether 

^{ip if) is unsatisfiable). Thus, in principle, the machinery devel- 

oped in this paper allows us to take the structure descriptors computed 
by existing techniques, and extract information from them that is more 
precise than that envisioned by the inventors of these formalisms. 
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— For many of the structure-description formalisms used in the literature, very 
little is known about basic decision problems associated with them. Mapping 
a structure-description formalism F into can provide a way to analyze 
many basic decision problems of F . 

For instance, a decision problem of special interest for structure-description 
formalisms that are used in abstract interpretation is the inclusion problem 
(i.e., whether the set of stores that structure descriptor Di represents is a 
subset of the set of stores that D 2 represents). When the inclusion problem 
is decidable, it is possible to check (i) whether one structure descriptor sub- 
sumes another (and hence the second need not be retained), and (ii) whether 
a simpler structure descriptor is a conservative replacement of a larger one, 
which is useful in widening. Thus, the inclusion problem is important for 
reducing both the time and space used during abstract interpretation. 

For a structure-description formalism in which each structure descriptor cor- 
responds to an Lr formula, the inclusion of structure descriptor Di (repre- 
sented by formula ifi) in D 2 (represented by (^ 2 ) is a matter of testing 
whether ipi (^2 always holds (or, equivalently, whether >^> 2 ) 

is unsatisfiable) . 

To date, our concern has been with developing the tools for describing prop- 
erties of linked data structures and obtaining a logic that is decidable. We have 
developed a decision procedure for Lr, although this procedure does not yield a 
practical algorithm. We have not yet investigated the complexity of the decision 
problem for Lr, nor looked for heuristic methods with acceptable performance 
in practice, but we plan to do so in future work. 

Two programs that will be used to illustrate our work are shown in Figure 1. 
The remainder of the paper is organized into six sections: Section 2 presents 
the logic we use for describing properties of linked data structures. Section 3 
shows how a generalization of Hendren’s path-matrix descriptors [9, 11] can be 
represented by Lr formulae. Section 4 shows how a variant of static shape graphs 
can be represented by Lr formulae. Section 5 discusses the issue of using Lr 
formulae to extract information from the results of program analyses. Section 6 
gives a sketch of the proof that Lr is decidable. Section 7 discusses related work. 



2 A Language for Stating Properties of Linked Data 
Structures 



Definition 21 Let PVar be the (finite) set of pointer variables in a given pro- 
gram. Let Sel be the set 0 / pointer selectors (i.e., pointer-valued fields of struc- 
tures) used in the program. We define the alphabet E to be the following finite 
set of symbols: E = Sel U {pvar? | pvar € PVar} U {^pvar? | pvar G PVar}, 
with the intended meaning that pvar? denotes the cell pointed to by the pointer 
variable pvar, and ^pvar? denote cells not pointed to by pvar. A formula in 
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typedef struct elemjist struct { 
int val; 

struct elemjist^truct *cdr; } Elements; 



Elements * elem_reverse(Elements 
/* acyclic _list(x) */) 

{ Elements *z, *y; 
y = NULL ; 
while (x != NULL) { 

/* acyclic _list(x) */ 

/* acyclic _list(y) */ 

/* disjoint Jists(x,y) */ 

z = y; 

y = X ; 

X = X ^cdr; 
y ^cdr = z; } 

/* acyclic_list(y) */ 

return y ; 

} 

(b) 



(a) 

, bool elem_delete(int delval, Elements *c) 
^ { /* acyclic Jist(c) */ 

Elements *elem,*prev; 
for (prev = NULL, elem = c; 
elem != NULL; 

prev=elem, elem = elem ^cdr) { 
if (elem ^val == delval) { 
if (prev == NULL) 
c = elem ^cdr; 
else 

prev ^cdr = elem ^cdr; 
free(elem); 
return TRUE; } } 

/* acyclic Jist(c) */ 
return FALSE; 

(c) 



Fig. 1. (a) A C declaration of a linked-list type. (b)A program that uses destructive- 
updating operations to reverse a list. (c)A program that searches the list pointed to 
by variable c (using a “trailing pointer” prev) and deletes the first element whose val 
field equals delval. 

the logic is defined as follows: 



P®1 — P®2 


equality of pointer exps. 


R ::=e 


empty path 


pei(i?)pe2 


reachability constraint 


1 0 


empty lang. 


hs(pe(i?)) 


heap-sharing constraint 


1 


a € S 


al{pe{R)) 


allocation constraint 


1 R1.R2 


concat. 


-n(l> 


negation 


1 Ri\R2 


union 


<Pl A <P2 


conjunction 


1 R* 


Kleene star 


(hi V <?2 


disjunction 


pe ::= pvar 


pointer var. 


<Pl <p 2 


implication 


1 pe.sel 


sel G Sel 



We call R terms routing expressions, and refer to occurrences o/pvar? and 
^pvar? in routing expressions as pointer- variable interrogations. 

We also use several shorthand notations: hs{p) and al{p) are shorthands 
for hs{p{e)) and alfp{e)), respectively. Similarly, fis(p.sel) and aZ(p.sel) are 
shorthands for hs{p{sel)) and al(p(sel)), respectively, pe^^ pe 2 is a shorthand 
for ^(pe^ = pe2). (p2 is a shorthand for {<Pi =A <^ 2 ) A (<^2 f^i)- 

Example 22 For a pointer variable x, the formula 

acyclic Jist(x) ^x(cdr+)x A ^hs(x(cdr*)) 
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states that x points to an unshared acyclic list. The term ^x(cdr+)x signifies 
that a traversal that starts with the cell pointed to by x and follows one or 
more cdr fields never returns to the cell pointed to by x. The term ^/is(x(cdr*)) 
signifies that a traversal that starts with the cell pointed to by x and follows 
zero or more cdr fields never leads to a cell that is “heap shared” . (A cell c is 
“heap shared” if two or more cells have fields that point to c, or if there is a cell 
d such that c'.car and c'.cdr both point to c [15,3,21,20].) 

Thus, a loop invariant for program elemjreverse can be written as follows: 

((a^(y.cdr) V a^(z)) y.cdr = z) 

A acyclic Jist(x) A acyclic Jist{y) (1) 

A ^x(cdr*)y A ^x(cdr*)z A ^y(cdr*)x A ^z(cdr*)x 

The first line of (1) states that y.cdr and z refer to the same list element when 
either one is allocated. The subformulae on the last line of (1) states that the 
x-list is disjoint from both the y-list and the z-list. 

Example 23 A loop invariant for program elem_delete can be written as fol- 
lows: 

acyclic Jist{c) A c(cdr*)elem 

A a^(prev) (c(cdr*)prev A prev.cdr = elem) (2) 

A ^al(prev) elem c 

The subformula c(cdr*)elem states that elem points somewhere in the list 
pointed to by c. The subformula on the last line of (2) states that prev is allo- 
cated (i.e., not null) if and only if elem and c point to different locations. From 
this, we can conclude that the location released by the statement free (elem) 
cannot be pointed to by c. 

The use of pointer-variable interrogations in routing expressions will be il- 
lustrated in Examples 33 and 36. 

We now define the semantics of formulae: 

Definition 24 A store S can be represented by a tuple {Loc^ , env^^L^), where 
Loc^ is a set of locations, and env^ and are functions 

env^ : PVar — > {Loc^ U {T}) 

: Sel^{Loc^U{±})^{Loc^U{±}), 

where is strict in T. 

The meaning of a pointer expression pe in a given store S, denoted by |pe] ^ 
(where Ipe]"® G {Loc^ U {T})J, is defined inductively, as follows: 

Ipvar]"® = env^ (pvar) 

Ipe.sel]-® = t‘®(sel)(|pe]‘®) 

The language L{R) of a routing expression R is defined as is usual for regular 
expressions. However, because a word in L{R) can contain occurrences of pointer- 
variable interrogations of the formpYacc? and ^pvar?, the meaning of a word is 
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slightly nonstandard: The meaning of a word in a given store S, denoted by |w] ^ 
(where |w]‘®: {Loc^ U {-L}) — > {Loc^ U {-L})^, is defined inductively, as follows: 



|w.sel]‘®(l) = 
|w.pvar?]‘®(0 = I 



(sel)(lu;f(0) 

|w]‘®(^) if enu(pvar) 
_L otherwise 



Iwfil) 



|w.^pvar?]‘®(^) 



|w]‘®(l) if ent;(pvar) 7^ |w]‘®(l) 
_L otherwise 



The meaning of formula in a given store S is defined inductively, as follows: 
|pei = pej-® = (IpeJ-® = [pej-®) 

|pe^(i?)pe2]‘® = there exists w G L{R) s.t. |wl‘®(|pe^]‘®) = |pe2]‘® 
and [pe2]‘® G Loc 

|/is(pe(i?))]‘® = there exists w G L{R) s.t. |wl‘®(|pe]‘®) = I and I G Loc and 
there exist h,l 2 G Loc, sell, sel2 G Sel s.t. t‘®(seli)(^i) = I 
andt‘®(sel2)(^2) = I and either (i) li I 2 or (ii) sell sel2 
|a^(pe(i?))]‘® = there exists w G L{R) such that |wl‘®(|pe]‘®) G Loc 
= l<pf is false 

1^1 A ^2!'® = I^J'® is true and |^2l‘® is true 
l<Pi V <^ 21 “^ = is true or |<?2l‘^ is true 

1^1 <p2j^ = is true or |^2l‘® is true 



3 Representing Path Matrices via Formulae 

In this section, we study the relationship between the logic L^ and a variant of 
the path-matrix structure-description formalism [9,11]. A path matrix records 
information about the (possibly empty) set of paths that exist between pairs of 
pointer variables in a program. The version of path matrices described below is 
a generalization of the original version described in [9, 11]. We show that every 
path matrix (of the extended version of the formalism) can be represented by a 
formula in logic L^ . 

Definition 31 A path matrix pm contains an entry pm[x, y] for every pair of 
pointer-valued program variables, x and y. An entry pm[x.,Y\ describes the set of 
paths from the cell pointed to by x to the cell pointed to by y. An entry pm[x, y] 
has a value of the form {R, Q), where R is a regular expression over S, and Q is 
either “P” (standing for “possible path” ) or “D” (standing for “definite” path). 

The notions of “possible paths” and “definite paths” are somewhat subtle 
(and the names “possible paths” and “definite paths”, which we have adopted 
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from [9, 11], are somewhat misleading). In the discussion below, let S' be a store 
that path matrix pm represents, and let Paths s{^,y) denote the set of paths 
from the cell pointed to by program variable x to the cell pointed to by y. 

— An entry pm[x, y] that has the value {Rdi D) means that there is a path p 
in S from the cell pointed to by program variable x to the cell pointed to by 
y, such that p S L{Rd)- In other words, 

Pathss{y^,j) P L{Rd) (3) 

Note that only one of the paths in L{Ru) need be a path in Pathss{^,y) 
for pm[x, y] = {Rd, D) to be satisfied. 

— An entry pm[x, y] that has the value {Rp, P) means that L{Rp) is an over- 
approximation to the set of paths from the cell pointed to by x to the cell 
pointed to by y. In other words, 

Paths s{x, y) C L{Rp). (4) 

An alternative way to think about this is as follows: What we really mean 
by “Rp represents possible paths in store 5” is that L{Rp) = S* — L{Rp) is 
a set of impossible paths of S: That is, an entry pm[x, y] that has the value 
{Rp, P) means that none of the paths from the cell pointed to by x to the 
cell pointed to by y are in L{Rp). Thus, we have 

Pathss{y^,y) P L{Rp) = (b. (5) 

These two ways of looking at things are equivalent, as shown by the following 
derivation: Pathss{x,y) H L{Rp) = 0 Pathss{x,y) — L{Rp) = 0 
Pathss{y^,y) C L{Rp). 

It is instructive to consider some simple examples of possible path-matrix 
entries: 

— An entry pm[x, y] that has the value (0, P) represents the fact that there 
is no path in S from the cell pointed to by program variable x to the cell 
pointed to by y. 

— An entry pm[x, y] that has the value (e, D) represents the fact that x and y 
are must-aliases, i.e., x and y must point to the same cell in all of the stores 
that the path matrix represents. 

— In contrast, an entry pm[x, y] with the value (e, P) represents the fact that 
X and y are may-aliases, i.e., x and y might point to the same cell in some 
of the stores that the path matrix represents, but it is also possible that in 
other stores that the path matrix represents, there is no path at all from the 
cell pointed to by x to the cell pointed to by y. 

— More generally, a value {R, P) for entry pm[x, y], where e G L{R) means that 
X and y are may aliases. The language L{R) — {e} represents other possible 
paths from the cell pointed to by x to the cell pointed to by y, but it is also 
possible that in some of the stores that the path matrix represents, there is 
no path at all from the cell pointed to by x to the cell pointed to by y. 
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Note that a path matrix represents a smaller set of stores if the language for 
a “D” entry is made smaller, and also if the language for a “P” entry is made 
smaller (see (3) and (4)). 

Example 32 The following path matrix expresses a loop-invariant for the loop 
of elemjreverse: 



pm 


X y z 


X 


{e,D) (0,P) (0,P) 


y 


(0,P) (e,L>) (cdr,D) 


Z 


(0,P) (0,P) {e,D) 



The fact that pm[y, z] is (cdr, D) signifies that y ^ cdr must point to the cell 
that z points to. 

Example 33 The following path matrix expresses a loop-invariant for the loop 
of elem_delete: 



pm 


prev 


elem 


c 


prev 




(cdr, P) 


(0,P> 


elem 


(0,P) 


(e.D) 


(0,P) 


c 


(cdr*,P) (ejcdr*. prev?. cdr, P) (e, P) 



The fact that pm[prev, elem] is (cdr, P) signifies that prev — > cdr may point 
to the cell pointed to by elem, but may also point to a cell that elem does 
not point to; in fact, the latter is the case at the beginning of the first loop 
iteration. Similarly, the fact that pm[c, prev] is (cdr*, P) signifies that prev may 
be reachable from c. The fact that pm[c, elem] entry is (ejcdr*. prev?. cdr, P) 
signifies that either c and elem point to the same cell, or else that as we traverse 
the list pointed to by c, we first reach a cell pointed to by prev and then the 
cell pointed to by elem. 

Remark. The routing expressions that we allow in path matrices are more 
general than the ones allowed in [9, 11] in the following way: 

— We allow arbitrary alternations and not just car] cdr. 

— We follow [13] in allowing pointer-variable interrogations (e.g., prev, ^prev) 
in routing expressions. This comes in handy in cases where several paths 
depend on each other (cf. the pm[c, elem] entry in path matrix (7)). 

(The use of a less-general language of routing expressions in [9, 11] was motivated 
by the need to be able to compute efficiently a safe approximation to the path 
matrix at every program point.) 

Since path matrices are an intuitive notation, we will not spend the space 
in directly formalizing the meaning of path matrices in terms of sets of stores. 
Instead, we now define the meaning of a path matrix by a formula in our language 
that characterizes the set of stores that a path matrix represents. 
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Definition 34 For a regular expression R, let R denote the complement of R, 
i.e., a regular expression such that L{R) = L{R) = S* —L{R). For a path matrix 
pm, we define the formula (ppm as follows: 

Fpm f\ x(i?)y A f\ -x(ii)y (g) 

x,yGPVar ,{R,D)Gpm[x,y] x,yGPVar ,{R,P)Gpm[x.,y] 



This definition is justified by the discussion that follows Definition 31. 

Example 35 Path matrix (6), which expresses a loop-invariant for the loop of 
elemjreverse (see Example 32), corresponds to the following formula: 

x(e)x A ~^x{S*)y A ^x{S*)z 

^ A y(e)y A y(cdr)z (9) 

A -^z{S*)x A ^z{S*)y A z(e)z 

Formula (9) is less informative than the loop-invariant given as Formula (1) of 
Example 22. For example, with Formula (9) it is not known that x points to a 

list, because cyclic stores of the form shown in Figure 2 also satisfy (9). 




Fig. 2. A store with a shared node. 



Example 36 Path matrix (10), which expresses a loop-invariant for the loop of 
elem_delete (see Example 33), corresponds to the following formula: 

prev(e)prev A ^prev(cdr)elem A ^prev(27*)c 

A ^elem(i7*)prev A elem(e)elem A elem(i7*)c (10) 

A ^c(cdr*)prev A ^c(e|cdr* .prev?.cdr)elem A c(e)c 

Formula (10) is less informative than the loop-invariant given as Formula (2) of 

Example 23. In contrast to Formula (2), Formula (10) cannot be used to conclude 
that the use of free in elem_delete is correct; i.e., we cannot conclude that the 
location released by the statement free(elem) cannot be pointed to by c. 
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4 Representing Shape Graphs via Formulae 

In this section, we study a structure-description formalism called static shape 
graphs, which, in addition to reachability information, allow certain “topologi- 
cal” properties of stores to be represented. There are many ways to define static 
shape graphs. For simplicity, we only consider the variant of static shape graphs 
defined in [21]. In Section 4.1, we give a formal definition of static shape graphs. 
Then, in Section 4.2, we construct a formula in Lr that exactly characterizes the 
set of stores represented by a static shape graph. 

4.1 Static Shape Graphs 

Below, we formally define static shape graphs. Unlike the stores defined in Sec- 
tion 2, static shape graphs are of an a priori bounded size, i.e., the number 
of shape nodes depends only of the size of the program being analyzed. This 
is needed by shape-analysis algorithms so that an iterative shape-analysis algo- 
rithm that computes static shape graphs for each program point will terminate. 

Definition 41 A static-shape-graph (SSG) is a finite directed graph that con- 
sists of two kinds of nodes — variables (i.e., PVar) and shape- nodes — and 
two kinds of edges — variable-edges and selector-edges. A shape- graph is 
represented by a quadruple {shapeNodes, Ey, Es,is), where: 

— shapeNodes is a finite set of shape nodes. Every shape node n G ShapeNodes 
has the form n = nx where X C PVar. Such a node describes the cells that 
are simultaneously pointed to by all the pointer variables in X. 

Graphically, we denote shape nodes by circles. The node n$ is the “summary- 
node” since it represents all the cells that are not directly pointed to by any 
pointer variable, and therefore it is represented by a dotted circle. 

— Ey is the graph’s set of variable- edges, each of which is denoted by a pair of 
the form [x, nx], where x G PVar and nx G shapeNodes. We assume that for 
every x G PVar, at most one variable-edge [x, nx] G Ey exists and x G X. 
Graphically, we denote variable-edges by solid edges since they must exist. 

— Es is the graph’s set of selector- edges, each of which is denoted by a triple of 
the form {nx, sel, ny), where nx, ny G shapeNodes and sel G {car, cdr{. We 
assume that for every x G PVar, sel G (car, cdr}, and shape node nx such 
that [x, nx] G Ey, at most one selector- edge, {nx, sel,ny) G Es exists. In 
contrast, there may be many selector-edges {nijj, sel,ny) G Es corresponding 
to different selector-edges emanating from cells represented by n$. 
Graphically, we denote selector-edges by dotted edges since they may or may 
not exist. 

— is (standing for “is shared”) is a function of type shapeNodes {false, true}. 
It serves as a constraint to restrict the set of stores represented by a shape 
graph. When n$ has more than one ineoming selector edge and yet is{n$) = 
false, we know that, for any memory cell c represented by n$, at most one of 
the concrete representatives of these selector-edges can be an incoming edge 
of c. 
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Graphically, we denote the fact that nx is a shared node by putting “is{nx) ” 
inside the circle. 



Example 42 The SSG that represents the store shown in Figure 2 is shown in 
Figure 3. 




Fig. 3. The SSG that corresponds to the store shown in Figure 2. 



4.2 From a Static Shape Graph to an i^Formula 

We are now ready to show how to construct the formula that captures the 
meaning of a static shape graph. 

Definition 43 Let SG = {shapeNodes, Ey, Es,is) be an SSG. We define the 
graph SG to be a directed graph, SG = (N,A,l), with edges labeled by letters in 
S, where: 

— N contains two nodes u.in and u.out for every shape node u in shapeNodes . 

— A C N X N contains the following two types of labeled edges: 

• For every shape node nx such that [pi, nx], [p 2 , nx], ■ ■ ■ , [Pn, nx] G Ey 
and [pn+i,nx],[pn+ 2 ,nx],- ■ ■ ,[Pn+k,nx] ^ Ey, there is an edge 
{nx-in, ny.out), labeled by (pi?._p 2 ?- • • • ._p„?.^_p„+i?.^_Pri+ 2 ? • • • .^p„+fc?). 

• If there is a selector-edge {nx, sel,nY) G Eg, there is an edge a = 
{nx .out, ny .in) from nx-out into Ny.in, labeled sel. 

— 1: A ^ E maps edges into their labels. 

For any two nodes nx,ny G shapeNodes, let Vn^.m^nY-out be the regular 
path expression over E that describes paths in SG from nx-in into ny.out 
(which can be computed by well-known methods, e.g., [25, 24]). For a finite set of 
regular expressions S = {ri,r 2 ,... ,r„}, Rsums denotes the regular expression 
ri\r 2 ] • • ■ |r„. Finally, for a regular expression r, r is the regular expression over 
E that describes the non-existing words in r, i.e., L{r) = E* — L{r). Let us 
define the following formulae to characterize the different aspects of SG 

= AxePVar,[x,nx]6S„ ^yal = AxePVar,[x,nx]^P« ~^al{x) 

^veq Ax,yePVar,[x,nx],[y,'^x]ePu ^ ^ ^veq Ax,yePVar,[x,nx]CPu ,[y,nx]^Pu ^ A Y 

^pal~ /\xePVar,[x,nx]eE„ ~^^K^{^^'^^riY^shapeNodes'<'nx .in^UY .out)) 

= A[x,nx],[y,n-K]eP« ^'^{f’nx .in^UY -outij 
^hs ~ PVar,[x,nx]C:E„ ^ shapeNodes, is{v) — lPnx .in^UY .out) ) 
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Finally, the formula ’P[SG] is the conjunction of these formulae. 

Lemma 44 For every store S and SSG SG, S is represented by SG if and only 
if is true. 

5 Extracting Information from Program Analyses via Lr 
Formulae 

Many interesting properties of linked data structures can be expressed as Lr 
formulae: 

dpf 

— For example, the formula F = {x = x.cdr), expresses the property “x points 
to a cell that has a self-cycle” . This information can be used by an optimizing 
compiler to determine whether it is profitable to generate a pre-fetch for the 
next element [18]. 

— It is possible to express in Lr that two pointer-access paths point to different 
memory cells (i.e., they are not may-aliases), which is important both for 
optimization and in tools for aiding software understanding. 

— The reachability and sharing predicates can also be useful, for example, to 
improve the performance of garbage-collection algorithms and to parallelize 
programs. 

In principle, Lr provides a uniform basis for using the results of analyses that 
yield either path matrices or static shape graphs in program optimizers and in 
tools for aiding software understanding. For instance. Figure 4 shows one of the 
SSGs SG that arises at the loop header in an analysis of elemjreverse. It can 
be shown that F is not satisfiable by any store that is represented by SG. This 
means that x does not point to a cell that has a self-cycle in any of the stores 
that SG represents. This can be determined automatically with our approach 
by showing that <P[SG] A F is not satisfiable. Similarly, by translating a path 
matrix M (obtained from a path-matrix-based program-analysis algorithm) into 
the corresponding Lr formula <?[M] and checking whether <F\M]AF is satisfiable, 
one can verify automatically whether x could point to a cell that has a self-cycle 
in any of the stores represented by M . 




Fig. 4. An SSG, SG, that represents acyclic lists of length two or more that are pointed 
to by variable x. 
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6 The Decidability of Lr 

Theorem 61 Lr is decidable. 

Sketch of Proof: Prior to directly approaching the question of decidability of 
Lr, one first proves a normalization lemma showing that the routing expressions 
mentioned in formulae can be rewritten in such a way that they deal only with 
paths that avoid all nodes pointed by pointer expressions that are mentioned in 
the formula (i.e. pointer expressions that occur in some constraint or program 
variables that occur in some pointer- variable interrogation). That is, they assert 
only reachability of shared nodes or pointer expressions via paths that traverse 
nodes in the heap. One proves this normalization lemma by breaking down path 
expressions that may cross mentioned pointer expressions into component path 
expressions that do not cross mentioned pointer expressions. 

The decidability of logic Lr follows from showing that that Lr has the hounded 
model property: that is, there is a computable numerical function / such that 
any sentence 4> of Lr that is consistent has a model of size bounded by f{\4>\). 
This technique is one of the most common in logical decision procedures [2] . It 
immediately shows the existence of a crude decision procedure: one enumerates 
all possible stores of size f{\(p\) searching for a model. (Note that the approach 
sketched here is intended only to give a comprehensible demonstration of decid- 
ability, not to give a practical decision procedure.) The proof of the bounded 
model property proceeds by starting with an arbitrary concrete store G satisfy- 
ing a formula (j) and showing that G can be diminished to a model of size |/(|<('|) 
(for a particular / given in the proof) while preserving all atomic formulae in (f. 

The normalization theorem above implies that in this shrinking process one 
only has to preserve properties that deal with paths through the heap (reacha- 
bility, heap-sharing, etc.) and equalities and inequalities between a fixed set of 
pointer expressions. This shrinking is then done in three phases: first, the original 
store G is “pruned” to get a model that is a union of trees: in the process, some 
information about the sharing of nodes is lost, but extra labels are added to the 
nodes to maintain this information. These “auxiliary labels” indicate that cer- 
tain nodes in the tree correspond to nodes associated with a particular pointer 
expression in the original store, and that certain nodes in the tree were shared 
in the original store. 

We then make use of classical decidability results on reachability expressions 
on finite trees ([26], summarized also in [2]) to shrink each of these trees to 
smaller trees that satisfy the same properties as the union of trees produced in 
stage one. The “properties” mentioned here are obtained by taking the original 
reachability, heap-sharing, and allocation constraints and transforming them to 
expressions in monadic second-order logic that express how to reach the auxiliary 
labels mentioned above. 

Finally, the shrunken set of trees are glued together to restore sharing in- 
formation lost in the first phase: multiple nodes that have been annotated as 
associated with the same pointer expression are identified, and nodes that were 
annotated as being shared heap nodes are made into shared nodes. The normal- 
ization results are used in a crucial way in this glueing stage, since the glueing 
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can create many new paths within the represented store. Glueing cannot, how- 
ever, create new paths through the heap in any store, since the glueing process 
only identifies nodes associated with pointer expressions mentioned in the for- 
mula (or in unshared paths leading to such nodes) . Since normalization implies 
that we are only concerned with preserving the existence and nonexistence of 
paths that lie strictly within the heap, this is sufficient. 

Figures 5 and 6 show how the proof might work for the formula 
x{car*)y A a;((cdr.cdr)*)y 

A^a;((cdr.cdr.cdr)*)y A y{car*)z A y((cdr.cdr.cdr)*) 2 . 

We start with a store in Figure 5 that satisfies <P, and then prune it into a set 
of trees. The auxiliary labels y' and y" keep track of the fact that these nodes 
in the tree must at some point be pointed to by y. In Figure 6, the trees are 
decreased in size, while preserving analogs of the reachability statements: e.g., 
the node labeled y can reach a copy of the node z with a (cdr.cdr.cdr)* path, 
and X cannot reach a copy of y with a (cdr.cdr.cdr)* path. In the final stage, 
the tree-like model is glued together to form a traditional store that satisfies 




Original Store 



Auxiliary Store produced in Pruning Phase 



Fig. 5. The pruning stage of the proof 



7 Related Work 

Jensen et al. have also defined a decidable logic for describing properties of linked 
data structures [13]. It is interesting to compare the two approaches: 
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X 




/. 



Result ot Shrinking Glueing logellier to give final 

smaller store satisfying fonnula 



Fig. 6. The shrinking and glueing stages of the proof 

— The logic of Jensen et al. allows quantifications on pointer expressions, which 
is forbidden in Lr- Instead, Lr allows stating both sharing constraints and 
allocation constraints. However, both of these can be encoded using their 
logic: 

• Sharing constraints that can be encoded using quantifications. 

• Allocation constraints can be encoded using tests for NULL in routing 
expressions. 

— Lr imposes a limitation on routing expressions by forbidding testing for NULL 
and for garbage cells. 

— On the other hand, Lr generalizes the logic of Jensen et al. in the following 
ways: 

• Lr allows multiple selectors, which enables Lr formulae to describe prop- 
erties of general directed graphs as opposed to just lists. ^ 

• The reachability constraints in Lr formulae allow one to test simultane- 
ous pointer inequities, which is crucial for capturing the strength of the 
variant of static shape graphs defined in [21]. 

In summary, the formulae of Jensen et al. are more expressive than Lr formu- 
lae, but they can only state properties of lists and trees, whereas Lr can state 
properties of arbitrary graph data structures. 

Klarlund and Schwartzbach defined a language for defining graph types, which 
are tree data structures with non-tree links defined by auxiliary tree-path ex- 
pressions [16]. In the application they envision, a programmer would be able 

^ [13] sketches an extension of their technique to trees, which involves multiple selec- 
tors, but they do not handle general directed graphs. 
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to declare variables of a given graph type, and write code to mutate the “tree 
backbone” of these structures. After a mutation operation, the runtime system 
would automatically apply an update operation that they define, which updates 
the non-tree links. The graph-type definition language is unsuitable for describ- 
ing arbitrary store graphs, and the fact that the update operations are limited 
does not allow the programmer to write arbitrary pieces of code. (However, the 
latter property is a significant advantage for the intended application — a pro- 
gramming language supporting controlled destructive updating of graph data 
structures.) 

The ADDS formalism of Hendren et al. is an annotation language for ex- 
pressing loop invariants and pre- and post-conditions of statements and proce- 
dures [10]. From a programmer’s point of view, an advantage of a logic like Lr 
over ADDS is that Lr is strong enough to allow stating properties of the kind 
that arise at intermediate points of a procedure, when a data structure is in the 
process of being traversed or destructively updated. For example, ADDS can- 
not be used to state the loop invariant of Example 22 because the relationship 
between x and y cannot be expressed. Because it is lacking in expressive power, 
ADDS is mainly useful as a documentation notation for type definitions, func- 
tion arguments, and function return values. Hendren et al. propose to handle 
this limitation of ADDS by extending it with the ability to use a certain limited 
class of reachability properties between variables (of the kind used in the path 
matrices defined in [9]). 

Lr goes beyond ADDS in the following ways: 

— Lr permits stating properties of cyclic data structures. 

— The routing expressions used in Lr formulae are general regular expressions 

(with pointer- variable interrogations). 

— Lr is closed under both conjunction and negation. In contrast, ADDS cannot 

express the loop invariant in Example 23 because of the implication. 

It should be noted that currently the notion of heap sharing in Lr is weaker 
than the ADDS notion of “dimension” . It is easy to generalize Lr to include this 
concept without affecting its decidability. We did not do so in this paper because 
we wanted to stay with two selectors. 

Finally, it should be noted that both Lr and ADDS do not allow stating 
connectivity properties of the form x(i?i) = y(i? 2 )- We believe that Lr can 
be generalized to handle this. (A limited form of such connectivity properties, 
restricted to be in the form x((car|cdr)*) = y((car|cdr)*), was proposed in [8, 

7 ].) 

Lr is incomparable to Deutsch’s symbolic aliases [4, 5] : Symbolic aliases allow 
the use of full-blown arithmetic, which cannot be used in a decidable logic. 
On the other hand, symbolic-alias expressions are not closed under negation. 
For instance, there is no way to express must-alias relationships using symbolic 
aliases. Thus, the loop invariant used in Example 23 cannot be expressed with 
symbolic aliases. 

In [6], Fradet and Le Metayer use graph grammars to express interesting 
properties of the data structures of a C-like language. Graph grammars can be a 
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more natural formalism than logic for describing certain topological properties 
of stores. However, graph grammars are not closed under intersection and nega- 
tion, and problems such as the inclusion problem are not decidable. In terms of 
expressive power, the structure-description formalism of [6] is incomparable to 
the one proposed in the present paper. 

It should be noted that the approach given here is limited in several ways: 
The approach we have taken is to develop decidable, logic-based languages for 
capturing topological properties of a broad class of linked data structures. Un- 
decidability results in predicate logic give many hard limitations on the expres- 
siveness of such languages: For example, no such language exists that is closed 
under first-order quantification and boolean connectives. Although logic-based 
formalisms can be more succinct in expressing properties of linked data struc- 
tures, they can also be more verbose; in particular, the output from our transla- 
tion algorithms can be significantly more verbose than the input. For example, 
with the translation from a static shape graph SG into Lr formula <P[SG] given 
in Section 4.2, the size of <1>[SG] can be exponential in IS'C]. 

There are a few properties that cannot be expressed in Lr, including: 
(i) whether a store contains a garbage cell (i.e., a cell not accessible from any 
variable), and (ii) whether a tree is balanced (or almost balanced, such as the 
condition used in AVL trees). It may be difficult to extend Lr to handle these 
sorts of properties. However, such properties go well beyond the scope of current 
optimizing compilers and tools for aiding software understanding. 
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Abstract. Control Flow Analysis is a widely used approach for analysing 
functional and object oriented programs. Once the applications become 
more demanding also the analysis needs to be more precise in its ability 
to deal with mutable state (or side-effects) and to perform polyvariant 
(or context-sensitive) analysis. Several insights in Data Flow Analysis 
and Abstract Interpretation show how to do so for imperative programs 
but the techniques have not had much impact on Control Flow Anal- 
ysis. We show how to incorporate a number of key insights from Data 
Flow Analysis (involving such advanced interprocedural techniques as 
call strings and assumption sets) into Control Flow Analysis (using Ab- 
stract Interpretation to induce the analyses from a collecting semantics). 



1 Introduction 

Control Flow Analysis. The primary aim of Control Flow Analysis is to deter- 
mine the set of functions that can be called at each application (e.g. x e where x 
is a formal parameter to some function) and has been studied quite extensively 
([24,11,16] to cite just a few). In terms of paths through the program, one tries 
to avoid working with a complete flow graph where all call sites are linked to 
all function entries and where all function exits are linked to all return sites. 
Often this is accomplished by means of contours [25] (a la call strings [23] or 
tokens [12]) so as to improve the precision of the information obtained. One way 
to specify the analysis is to show how to generate a set of constraints [8,9,18,19] 
whose least solution is then computed using graph-based ideas. However, the 
majority of papers on Control Flow Analysis (e.g. [24,25,11,16]) do not consider 
side-effects — a notable exception being [10]. 

Data Flow Analysis. The mtraprocedural fragment of Data Flow Analysis ig- 
nores procedure calls and usually formulates a number of data flow equations 
whose least solution is desired (or sometimes the greatest when a dual ordering 
is used) [7]. It follows from Tarski’s theorem [26] that the equations could equally 
well be presented as constraints: the least solution is the same. 

The mterprocedural fragment of Data Flow Analysis takes procedure calls into 
account and aims at treating calls and returns more precisely than mere goto’s: if 
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a call site gives rise to analysing a procedure with a certain piece of information, 
then the resulting piece of information holding at the procedure exit should 
ideally only be propagated back to the return site corresponding to the actual 
call site (see Figure 1). 

In other words, the mtraprocedu- 
ral view is that all paths 
through a program are valid 
(and this set of paths is a 
regular language), whereas the 
mterprocedural view is that 
only those paths will be valid 
where procedure entries and ex- 
its match in the manner of 
parentheses (and this set of 
paths is a proper context free 
language). Most papers on Data 
Flow Analysis (e.g. [23,13]) do 
not consider first-class proce- 
dures and therefore have no 
need for a component akin to 
Control Flow Analysis — a no- 
table exception to this is [20] . 

One approach deals with the 
interprocedural analysis by ob- 
taining transfer functions for 
entire call statements [23,13] 

(and to some extent [3]). Alter- 
natively, and as we shall do in 
this paper, one may dispense with formulating equations (or constraints) as the 
function level and extend the space of properties to include explicit context 
information. 

— A widely used approach modifies the space of properties to include informa- 
tion about the pending procedure calls so as to allow the correct propagation 
of information at procedure exits even when taking a mainly intraprocedural 
approach; this is often formulated by means of call strings [23,27]. 

— A somewhat orthogonal approach modifies the space of properties to include 
information that is dependent on the information that was valid at the last 
procedure entry [20,14,21]; an example is the use of so-called assumption 
sets that give information about the actual parameters. 




Fig. 1. Function call. 



Abstract Interpretation. In Abstract Interpretation [4], the systematic develop- 
ment of program analyses is likely to span a spectrum from abstract specifications 
(like [16] in the case of Control Flow Analysis), over syntax- directed specifications 
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(as in the present paper), to actual implementations in the form of constraints 
being generated and subsequently solved (as in [8,9,18,19,6]). The main advan- 
tage of this approach is that semantic issues can be ignored in later stages once 
they have been dealt with in earlier stages. The first stage, often called the col- 
lecting semantics, is intended to cover a superset of the semantic considerations 
that are deemed of potential relevance for the analysis at hand. The purpose 
of each subsequent stage is to incorporate additional implementation oriented 
detail so as to obtain an analysis that satisfies the given demands on efficiency 
with respect to the use of time and space. 

Aims. This paper presents an approach to program analysis that allows the si- 
multaneous formulation of techniques for Control and Data Flow Analysis while 
taking the overall path recommended by Abstract Interpretation. To keep the 
specification compact we present the Control Flow Analysis in the form of a 
succinct flow logic [17]. Throughout the development we maintain a clear sep- 
aration between environment-like data and store-like data so that the analysis 
more clearly corresponds to the semantics. As in [10] we add components for 
tracking the side-effects occurring in the program and for explicitly propagating 
environments; for the side-effects this gives rise to a flow-sensitive analysis and 
for the environments we might coin the term scope- sensitive. 

The analysis makes use of mementoes (for expressing context information in the 
manner of [5]) that are general enough that both call string based approaches 
(e.g. [23,25]) and dependent data approaches (in the manner of assumption- 
sets [20,14]) can be obtained by merely approximating the space of mementoes; 
this gives rise to a context-sensitive analysis. The mementoes themselves are 
approximated using a surjective function and this approach facilitates describing 
the approximations between the various solution spaces using Galois connections 
as studied in the framework of Abstract Interpretation [3,4,1]. 

Overview. Section 2 presents the syntax of a functional language with side- 
effects. Section 3 specifies the abstract domains and Section 4 the analysis itself. 
In Section 5 we then show how the classical developments mentioned above can 
be obtained as Abstract Interpretations. Finally, Section 6 concludes. — The 
full version of this paper is available as a technical report which establishes the 
correctness of the analysis and contains the proofs of the main results. 



2 Syntax 

We shall study a functional language with side-effects in the style of Standard 
ML [15]. It has variables x, f G Var, expressions e and constants c given by: 

e ::= c | a; | a: => e | f x => e \ {ei 62)^ | ci ; 62 | ref^ e 
I !e I Cl : = 62 I let a; = 6i in 62 | if e then ei else 62 

c ::= true | false | () | • • • 
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m G Mem 


= {o} U (Lab X Mem) x Val x Store x (Pntp x Mem 


d G Data 


= • • • (unspecified) 


(tt, md ) G Closure 


= Pntp X Mem 


[ zu , md ) G Cell 


= PntR X Mem 


V G VaU 


= Data U Closure U Cell 


W 


= "P(Mem X VaU) 


R G Env 


= Var ^ Val 


S G Store 


= Cell ^ \^l 



Table 1. Abstract domains. 



Here fn^ a; => e is a function that takes one argument and fun^ / a; => e is a 
recursive function (named /) that also takes one argument. We have labelled all 
syntactic occurrences of function applications with a label I G Lab, all defining 
occurrences of functions with a label tt G Pntp and all defining occurrences of 
references with a label w G PntR. 

In Appendix A the semantics is specified as a big-step operational semantics 
with environments p and stores a. The language has static scope rules and 
we give it a traditional call- by- value semantics using judgements of the form 
P I- (e,CTi) ^ (w,(J2). 



3 Abstract Domains 

Mementoes. The analysis will gain its precision from the so-called mementoes 
(or contours or tokens). A memento m G Mem represents an approximation of 
the context of a program point: it will either be o representing the initial context 
where no function calls have taken place or it will have the form 

((;,m/i),lT, S', (tt, md)) 

representing the context in which a function is called. The idea is that 

— {I, iTih) describes the application point; I is the label of the function applica- 
tion and mfi is the memento at the application point, 

— IT is an approximation of the actual parameter at the application point, 

— S is an approximation of the store at the application point, and 

— (tt, md) describes the function that is called; tt is the label of the function 
definition and is the memento at the definition point of the function. 

Note that this is well-defined (in the manner of context-free grammars): com- 
posite mementoes are constructed from simpler mementoes and in the end from 
the initial memento o. This definition of mementoes is akin to the contexts con- 
sidered in [5]; in Section 5 we shall show how the set can be simplified into 
something more tractable. 




24 



Flemming Nielson and Hanne Riis Nielson 



e RCachep = Pntp — > Env 
Mf £ MCachep = Pntp — > 'P(Mem) 

Wf £ WCachep = (»Pntp U Pntp») ^ Val 
Sf £ SCachep = (»Pntp U Pntp») ^ Store 
Table 2. Caches. 



Example 1. Consider the program “program” defined by: 

((fiia; X => ((x x)i (fny y => x))2) (fn^ z => z))^ 

The applications are performed in the order 3, 1 and 2. The mementoes of interest 
are going to be: m 3 = ((3,o), W 3 , [ ], (a:,o)),mi = ((l,m 3 ),lTi, [ ], (z,o)),m 2 = 
((2, mi), IT 2 , [ ], (-2)0)) where Wi, W 2 and W 3 will be specified in Example 2 and 
[ ] indicates that the store is empty. □ 

Abstract values. We operate on three kinds of abstract values: data, function 
closures and reference cells. Function closures and reference cells are represented 
as pairs consisting of the label (tt and w, respectively) of the definition point and 
the memento rrid at the definition point; this will allow us to distinguish between 
the various instances of the closures and reference cells. The abstract values will 
always come together with the memento (i.e. the context) in which they live so 
the analysis will operate over sets of pairs of mementoes and abstract values. The 
set Val obtained in this way is equipped with the subset ordering (denoted C). 
The sets Env and Store of abstract environments and abstract stores, respectively, 
are now obtained in an obvious way and ordered by the pointwise extension of 
the subset ordering (denoted C). 

Example 2. Continuing Example 1 we have 

^3 = {(o, (2,0))} VEi = {(m 3 , (z,o))} IV 2 = {(mi, (y,m 3 ))} 

since the function z is defined at the top-level (o) and y is defined inside the 
application 3. □ 

Caches. The analysis will operate on five caches associating information with 
functions; their functionality is shown in Table 2. The caches TZp, TZp and Aip 
associate information with the labels tt of function definitions: 

— The environment caches TZp and TZp: for each program point tt, TZp(tt) 
records the abstract environment at the definition point and TZf(TT) records 
the same information but modified to each of the contexts in which the 
function body might be executed. — As an example, the same value u of a 
variable x used in a function labelled tt may turn up in TZp{n){x) as (md, v) 
and in TZp(tt)(x) as (me, v) where md = o in case of a top-level function and 
me = ((l,o), W, S, ( 7 T,o)) in case of a top-level application 1. 
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— The memento cache A4 f- for each program point tt, records the set 

of contexts in which the function body might be executed; so = 0 

means that the function is never executed. 

The caches Wf and Sf associate information with function calls. For a function 
with label tt G Pntp we shall use •tt (e •Pntp) to denote the point just before 
entering the body of the function, and we shall use tt* (g Pntp*) to denote the 
point just after leaving the body of the function. The idea now is as follows: 

— The value cache Wf- for each entry point utt, Wf{*'!^) records the abstract 
value describing the possible actual parameters, and for each exit point 7t», 

records the abstract value describing the possible results of the call. 

— The store cache Sf- for each entry point •tt, 5f(»7t) records the abstract 
store describing the possible stores at function entry, and for each exit point 
7T*, 5f(7t*) records the abstract store describing the possible stores at func- 
tion exit. 



Example 3. For the example program we may take the following caches: 



TT 


X 


y 


Z 


Wf(»7t) 


{(m3,(z,o))} 


0 


{(mi, (z,o)), (m2, (y, m3))} 


Wf(7T») 


{(m3, (y, m3))} 


0 


{(mi, (z,o)), (m2, (y, m3))} 


5f(*7t) 


[] 


[] 


[] 


5f(7t*) 


[] 


[] 


[] 


7^^(7^) 


[] 


[x^ {(m3, (2,0))}] 


[] 


^fW 


[] 


[] 


[] 


Mf{t^) 


{m3} 


0 


{mi, m2} 



4 Syntax-directed Analysis 

The specification developed in this section is a recipe for checking that a proposed 
solution is indeed acceptable. This is useful when changing libraries of support 
code or when installing software in new environments: one merely needs to check 
that the new libraries or environments satisfy the solution used to optimise the 
program. It can also be used as the basis for generating a set of constraints [ 17 ] 
whose least solution can be obtained using standard techniques (e.g. [2]). 

Given a program e and the five caches ( 7 ^^, A 4 f, Wf, 5 f) the purpose of 
the analysis is to check whether or not the caches are acceptable solutions to the 
Data and Control Flow Analysis. The first step is to find (or guess) the following 
auxiliary information: 

— an abstract environment R G Env describing the free variables in e (and 
typically it is T if there are no free variables in the program). 
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— a set of mementoes M € 'P(Mem) describing the possible contexts in which 
e can be evaluated (and typically it is {o}), 

— an initial abstract store G Store describing the mutable store before eval- 
uation of e begins (and typically it is T if the store is not initialised before 
use), 

— a final abstract store S2 G Store describing the mutable store after evaluation 
of e completes (and possibly it is T), and 

— an abstract value VF G Val describing the value that e can evaluate to (and 
it also possibly is T). 

The second step is to check whether or not the formula 
R,M > e: Si ^ 82 k W 

is satisfied with respect to the caches supplied. This means that when e is exe- 
cuted in an environment described by R, in a context described by M, and upon 
a state described by the following happens: if e terminates successfully then 
the resulting state is described by S2 and the resulting value by W. 

We shall first specify the analysis for the functional fragment of the language 
(Table 3) and then for the other constructs (Table 4). As in [16] any free variable 
on the right-hand side of the clauses should be regarded as existentially quanti- 
fied; in principle this means that their values need to be guessed, but in practice 
the best (or least) guess mostly follows from the subformulae. 

Example 4 - Given the caches of Example 3, we shall check the formula: 

[ ],{o} l> program : [ ] ^ [ ] & {(o, (y.ms))} 

So the initial environment is empty, the initial context is o, the program does 
not manipulate the store, and the final value is described by {(o, {y, m3))}. □ 



The functional fragment. For all five constructs in the functional fragment 
of the language the handling of the store is straightforward since it is threaded 
in the same way as in the semantics. 

For constants and variables it is fairly straightforward to determine the abstract 
value for the construct; in the case of variables we obtain it from the environment 
and in the other case we construct it from the set M of mementoes of interest. 

For function definitions no changes need take place in the store so the abstract 
store is simply threaded as in the previous cases. The abstract value representing 
the function definition contains a nested pair (a triple) for each memento m 
in the set M of mementoes according to which the function definition can be 
reached: in a nested pair (mi, (tt, m2)) the memento mi represents the current 
context and the pair (tt, m2) represents the value produced (and demanding that 
mi = m2 corresponds to performing a precise relational analysis rather than a 
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R,M > c:Si^ S2&iW 

iff Si C S2 A {(m, dc)\meM}CW 

R,M t> X : Si ^ S 2 & IF 
iff Si □ S 2 A R{x) C W 

R, M > fn,r ® => e : Si ^ S 2 & IF 

iff Si C S 2 A {(m, (Tr,m)) I m€ M} C W A RO 7 ^f.( 7 ^) A 

77.^(7t)[® W’f(»7t)], AdF(7r) > e : 5 f(» 7 t) ^ Sf( 7 T») & W’f( 7 T») 

R,M\> fun^ / a; => e : Si ^ S 2 & IF 

iff Si C S 2 A {(m, (tt, m)) I m e M} C if a 
R[f {{rn, (tt, m)) | m £ M}] C TL%{'k) A 

77.F(Tr)[a; W’f(»7t)], AdF(7r) > e : Sf(»7t) ^ Sf(7t») & WF(7r») 

i?,M > (ei 62 )' : Si ^ S 4 & IF 

iff R,M > ei : Si ^ S2 & IFi A i?, M > 62 : S2 ^ S3 & IF2 A 
Vtt £ {tt I (m, (tt, md)) £ IFi} : 
let X = new^((Z,M),IF2,S3,IFi) 

Xdc = {{rrid, rUc) \ (md, mn, rric) £ X} 

Xc = {rUc I {md,mh,mc) £ X} 

Xhc = {{rrih, rUc) \ {md, mn, me) £ X} 

Xeh = {{me, mh) I {md, mn, me) £ X} 
in n%{'^)\Xde{ C 71^(7t) a Xc C Mf{-x) a 
IF 2rXdcl C WF(*7r) A S3rXdcl CSf(*7t) a 
WF(7T.)rXcdl c IF A SF{n»)\Xeh{ F S 4 

r^^{{l,M),W,S,W) = 

{{md, mh, me) | {mn, {tt, md)) £ IF', mh £ M, me = new{{l, mh), W, S, {tt, md))} 
Table 3. Analysis of the functional fragment. 



less precise independent attribute analysis). Finally, the body of the function 
is analysed in the relevant abstract environment, memento set, initial abstract 
state, final abstract state and final abstract value; this information is obtained 
from the caches that are in turn updated at the corresponding call points. More 
precisely, the idea is to record the abstract environment at the definition point 
in the cache TZp and then to analyse the body of the function in the context of 
the call which is specified by the caches TZp, Xip, and Sp as explained in 
Section 3. The clause for recursive functions is similar. 

Example 5. To check the formula of Example 4 we need among other things to 
check: 

[ ],{o} l> fn^ z => z : [ ] ^ [ ] & {(o, ( 2 , 0 ))} 

This follows from the clause for function definition because [ ] E [ ] and the 
clause for variables gives: 

{(mi,(2;,A)),(m2,(y,m3))}],{mi,m2}l>z : [] ^ [] & {(mi, (a, a)), (m2, (y, m3))} 
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Note that although the function 2 is called twice, it is only analysed once. □ 

In the clause for the func- 
tion application (ci 62)^ 
we first analyse the op- 
erator and the operand 
while threading the store. 

Then we use Wi to de- 
termine which functions 
can be called and for each 
such function tt we pro- 
ceed in the following way. 

First we determine the 
mementoes to be used 
for analysing the body of 
the function tt. More pre- 
cisely we calculate a set 
X of triples {rud, ruh, rric) 
consisting of a definition 
memento describing 
the point where the func- 
tion 7T was defined, a 
current memento de- 
scribing the call point, 
and a memento m,c de- 
scribing the entry point 
to the procedure body. 

(For the call (x x)^ in Ex- 
ample 1 we would have X = {(0,7713, mi)} and tt = z.) For this we use the 
operation riew.n. whose definition (see Table 3) uses the function 

new : (Lab x Mem) x Val x Store x (Pntp x Mem) Mem 

for converting its argument to a memento. With Mem defined as in Table 1 this 
will be the identity function but for simpler choices of Mem it will discard some 
of the information supplied by its argument. 

The sets Xdc, Xc, Xhc, and Xch are “projections” of X. The body of the function 
7T will be analysed in the set of mementoes obtained as Xc and therefore Xc is 
recorded in the cache Xip for use in the clause defining the function. Because 
the function body is analysed in this set of mementoes we need to modify the 
mementoes components of all the relevant abstract values. For this we use the 
operation 

W\Y^ = {(m2, f) I {mi,v) e W, (mi, m2) € Tj 

defined on W C Val and Y C Mem x Mem. This operation is lifted to abstract 
environments and abstract stores in a pointwise manner. 




Fig. 2. Analysis of function call. 
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Coming back to the clause for application in Table 3, the abstract environment 
TZp{tt) is relative to the mementoes of the definition point for the function and 
thus has to be modified so as to be relative to the mementoes of the called 
function body and the set Xdc facilitates performing this transformation. (For 
the call (x x)^ in Example 1 we would have that X^c = {(o, ’tzi)}.) In this way we 
ensure that we have static scoping of the free variables of the function. The actual 
parameter W2 is relative to the mementoes of the application point and has to be 
modified so as to be relative to the mementoes of the called function body and 
the set Xfic facilitates performing this transformation; a similar modification is 
needed for the abstract store at the entry point. We also need to link the results 
of the analysis of the function body back to the application point and here the 
relevant transformation is facilitated by the set Xch- 

The clause for application is illustrated in Figure 2. On the left-hand side we 
have the application point with explicit nodes for the call and the return. The 
dotted lines represent the abstract environment and the relevant set of memen- 
toes whereas the solid lines represent the values (actual parameter and result) 
and the store. The transfer function \Xdc] is used to modify the static environ- 
ment of the definition point, the transfer function |"Ai/icl is used to go from the 
application point to the function body and the transfer function is used 

to go back from the function body to the application point. Note that the figure 
clearly indicates the different paths taken by environment-like information and 
store-like information - something that is not always clear from similar figures 
appearing in the literature (see Section 5). 

Example 6. Checking the formula of Example 4 also involves checking: 

[x 1-^ {(m3, (2,0))}], {m3} l> (x x)i : [ ] ^ [ ] & {(m3, (2,0))} 

For this, the clause for application demands that we check 

[x 1-^ {(m3, (z,o))|],{m3} l> X : [ ] ^ [ ] & {(m3, (z,o))| 
which follows directly from the clause for variables. 

Only the function z can be called so we have to check the many conditions only 
for this function. We shall concentrate on checking that {(m3, (z, o))}|'X/ic] C 
Wf{*z) and WF{z•)\Xch^ C {(m3, (z, o))}. Since X = {{0,1713, mi)} we have 
Xhc = {(w3, mi)| and the effect of the transformation will be to remove all pairs 
that do not have m3 as the first component and to replace the first components 
of the remaining pairs with mi ; using Example 3 it is immediate to verify that 
the condition actually holds. Similarly, Xch = {{mi, m3)} so in this case the 
transformation will remove pairs that do not have mi as the first component 
(i.e. pairs that do not correspond to the current call point) and replace the first 
components of the remaining pairs with m3; again it is immediate to verify that 
the condition holds. □ 

Other constructs. The clauses for the other constructs of the language are 
shown in Table 4. The clauses reflect that the abstract environment and the set 
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R, M > ei ; 62 : Si ^ S3 & W"2 

iff R,M > 61 : Si ^ S2 & W"i A R, M > 62 : S2 ^ S3 & 14^2 

R,M t> ref „ 6 : Si ^ S3 & W"' 

iff R, M > 6 : Si ^ S2 & W" A {(m, (tu, m)) | m e M} C iF' A S2 C S3 A 
VmeM -.WO Si{m,m) 

R,M t> ! 6 : Si ^ S2 & IF' 

iff R, M > 6 : Si ^ S2 & IF A V(m, {vj, ma)) e IF : S2(vJ, ma) C IF' 

R,M > 61 := 62 : Si ^ S4 & IF 

iff R, M > 61 : Si ^ S2 & IFi A R, M > 62 : S2 ^ S3 & 1F2 A 

{(m, d()) I m e M} C IF A S3 F S4 A V(m, (ro, ma)) € VFi : VF2 C Si{m, ma) 

R,M[> let 31 = 61 in 62 : Si ^ S3 & 1F2 

iff R,M > 61 : Si ^ S2 & IFi A R[x IFi], M > 62 : S2 ^ S3 & 1F2 

R,M > if 6 then 61 else 62 : Si ^ S5 & IF' 
iff R, M > 6 : Si ^ S2 & IF A 

letRi = = S3 = 4 :uT’(S 2 ); s 4 = 4 :i^'(S 2 ) 

in Ri,M > 61 : S3 ^ S5 & IF' A i?2, M > 62 : S4 ^ S5 & IF' 

Table 4. Analysis of the other constructs. 



of mementoes are passed to the subexpressions in a syntax-directed way and 
that the store is threaded through the constructs. The analysis is fairly simple- 
minded in that it does not try to predict when a reference (zu, ma) in the analysis 
only represents one location in the semantics and hence the analysis does not 
contain any kill-components (but see Appendix B). 

For the let-construct we perform the expected threading of the abstract environ- 
ment and the abstract store. For the conditional we first analyse the condition. 
Based on the outcome we then modify the environment and the store to reflect 
the (abstract) value of the test. For the environment we use the transfer func- 
tions (-R) and (-R) whereas for the store we use the transfer functions 

4‘^tT^{S2) and <('[aS(' 5 ' 2 )- The result of both branches are possible for the whole 
construct. 

As an example of the use of these transfer functions consider the expression 
if X then Cl else 62 where it will be natural to set 

</5truTk^) = 1 -^ VF n {(m, dtrue) | m e Mem}] 



and similarly for Thus it will be possible to analyse each of the 

branches with precise information about x. 

Little can be said in general about how to define such functions; to obtain a 
more concise statement of the theorems below we shall assume that the transfer 
functions and (p'.'.'. of Table 4 are in fact the identities. 
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rrik € Merrifc = Lab-*^ 



d 


G 


Data = 


(tt, nrikd) 


G 


Closures, = 


{uj, mkd) 


G 


Cells, = 


Vk 


G 


II 

_< 

> 


Wk 


G 


Valfc 


Rk 


G 


Envfc = 


Sk 


G 


Stores, = 



(unspecified) 
Pntp X Memfc 
PntR X Memfc 

Data U Closurefc U Cellfc 
iP(Memfc X VaUfc) 

Var — > Valfc 
Cellfc Valfc 



Table 5. Abstract domains for fc-CFA. 



5 Classical Approximations 

fc-CFA. The idea behind fc-CFA [11,24] is to restrict the mementoes to keep 
track of the last fc call sites only. This leads to the abstract domains of Table 5 
that are intended to replace Table 1. Naturally, the analysis of Tables 2, 3, and 4 
must be modified to use the new abstract domains; also the function new.„. must 
be modified to make use of the function 

newfc : (Lab x Mem^) x Vab x Stores, x (Pntp x Mems,) — > Mems, 

defined by newfc((;, VFfc, 5fc, (tt, mfcd)) = takefc(rmfc/j) where denotes 
prefixing and takes, returns the first fc elements of its argument. This completes 
the definition of the analysis. 



Theoretical properties. One of the strong points of our approach is that we can 
use the framework of Abstract Interpretation to describe how the more tractable 
choices of mementoes arise from the general definition. 

To express the relationship between the two analyses define a surjective mapping 
/is, : Mem — > Mems, showing how the precise mementoes of Table 1 are trun- 
cated into the approximative mementoes of Table 5. It is defined by p,o{rn) = 
e, ^k+i{o) = £, iTi), W, S, (tt, rrid)) = V pik{m) where £ denotes the empty 

sequence. It gives rise to the functions : T^(Mem) ^ 7^(Mems,) and 7f : 
■p(Mems,) ^ T^(Mem) defined by (M) = {/is,(m) | m G M} and 7^(Ms,) = 
{m I Hk{rn) G Ms,}. Since is surjective and defined in a pointwise manner 
there exists precisely one function such that 

Ik 

T^(Mem) 7^(Mems,) 

is a Galois insertion as studied in Abstract Interpretation [4] : this means that 
and 7 ^ are both monotone and that 7 ^(a^(M)) D M and a^( 7 ^(Ms,)) = Ms, 
for all M C Mem and Ms, C Mems,. One may check that 7 ^ is as displayed above. 
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To obtain a Galois insertion 

^ ~<k 

Val ^ Valfc 

we first define a surjective mapping 77^ : Mem x Val a — > Mem k X VaUfc by 
taking rjk{mh,d) = , d) , r]k{mh, {TT.md)) = {^J,k{rnh), {tt, ^j,k{md))), and 

(h 7, md)) = (7rfc(m/i), (h 7, Tifc(md))). Next define and 7^ bya^(VF) = 
{rjk{m,v) I {m,v) G W} and Jk{Wk) = {{m,v) \ rjk{m,v) G Wk}- It is then 
straightforward to obtain a Galois insertion 

. 7^ , 

Env Envfc 

by setting af{R){x) = a\{R{x)) and 'yk{Rk){x) = 7^(i?fc(a;)). To obtain a 
Galois insertion 



Store Storefc 

of 

define of (S')(n7, mfcd) = a^(lJ{S'(tz7, m^) \ ^J,k{md) = rukd}) and 7 ^ (S'fc)(n7, m^) 
= Ik {Sk{xu , ^ik{md))). 

We now have the machinery needed to state the relationship between the present 
fc-GFA analysis (denoted l>fc) and the general analysis of Section 4 (denoted l>): 

Theorem 1 . If {TZfp,TZ%p,MkF,yVkF,SkF) satisfies 
Rk,Mk \>k e : Ski Sk2 & Wk 

then (yf o TZ'j^p, yf o TZlp, 7“ o MkF, iX o WfcF, 7fc ° SkF) satisfies 

li{Rk),lX^{Mk) > e:j^{Ski)^ji{Sk 2 )SzjX{Wk). 

In the full version we establish the semantic correctness of the analysis of Section 
4; it then follows that semantic correctness holds for fc-GFA as well. 



Call strings of length k. The clause for application involves a number of 
transfers using the set X relating definition mementoes, current mementoes and 
mementoes of the called function body. In the case of a fc-GFA like approach it 
may be useful to simplify these transfers. 

The transfer using X^c can be implemented in a simple way by taking 
Xhc = {(m/i,takefc(rm/i)) | irih G M} 

where I is the label of the application point. This set may be slightly too large 
because it is no longer allowed to depend on the actual function called (the 
7t) and because there may be nih G Mk for which no {mu, {F,md)) is ever an 
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element of Wi. However, this is just a minor imprecision aimed at facilitating a 
more efficient implementation. In a similar way, one may take 

Xc = {takefc(rm/i) j rrih € Mj 
where again this set may be slightly too large. 

The transfers using Xch can also be somewhat simplified by taking 
Xch = {(takefc(rm/i),m/i) | rrih G M} 

= {(mc,dropi(mc)) | dropi (me) G M} 

U {(me, dropi(mc)'^') | dropi(me)'^' G M} 

where dropi drops the first element of its argument (yielding e if the argument 
does not have at least two elements) . Again this set may be slightly too large. 

The transfer using Xdc can be rewritten as 

Xdc = {(md,takefc(rm/i)) | m/i G M, {rrih, (tt, m<i)) G Wi} 

where I is the application point and tt is the function called. 



For functions being de- 
fined at top-level there 
is not likely to be too 
much information that 
need to be transformed 
using Xdc', however, sim- 
plifying Xdc to be inde- 
pendent of 7T is likely to 
be grossly imprecise. 

Performing these modi- 
fications to the clause 
for application there is 
no longer any need for 
an explicit call of riew.n.. 
The resulting analysis is 
similar in spirit to the 
call string based analy- 
sis of [27]; the scenario 
of [23] is simpler because 
the language considered 
there does not allow lo- 
cal data. Since we have 




changed the definition of 

the sets Xdc, Xc, Xhc and Fig. 3. Degenerate analysis of function call. 

Xch to something that is 
no less than before, it fol- 
lows that an analogue of Theorem 1 still applies and therefore the semantic 



correctness result still carries over. 
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mp £ Merrip = {e} U "P(Data U Pntp U PntR) 



d £ Data = • • • (unspecified) 

{n,mpd) £ Closurep = Pntp x Metrip 
(zu,mpd) £ Cellp = PntR x Memp 



Vp £ VaUp 
Wp £ V^lp 

Rp £ Envp 
Sp £ Storep 



= Data U Closurep U Cellp 
= 'P(Memp X VaUp) 

= Var ^ Valp 
= Cellp ^ Valp 



Table 6. Abstract domains for assumption sets. 



It is interesting to note that if the distinction between environment and store is 
not clearly maintained then Figure 2 degenerates to the form of Figure 3; this 
is closely related to the scenario in [22] (that is somewhat less general). 



Assumption sets. The idea behind this analysis is to restrict the mementoes 
to keep track of the parameter of the last function call only; such information 
is often called assumption sets. This leads to the abstract domains of Table 6 
that are intended to replace Table 1. Naturally, the analysis of Tables 2, 3, and 4 
must be modified to use the new abstract domains; also the function new.^. must 
be modified to make use of the function 

neWp : (Lab x Metrip) x Valp x Storep x (Pntp x Metrip) ^ Metrip 

given by newp((l,mph),Wp, Sp, (7T,mpd)) = {keepp(vp) | {rrip,Vp) £ Wp} where 
keepp : VaUp (DataU PntpU PntR) is given by keepp(d) = d, keepp(7r, rripd) = tt, 
keepp{zu,mpd) = w. 



Theoretical properties. We can now mimic the development performed above. 
The crucial point is the definition of a surjective mapping /ip : Mem ^ Memp 
showing how the precise mementoes of Table 1 are mapped into the approxima- 
tive mementoes of Table 6. It is given by /ip(o) = e, and pip{{l, m), W, S, (tt, rrid)) 
= {keepp(u) I {m' ,v') £ W}. where keep], : VaU — > (Data U Pntp U Pntp) is the 
obvious modification of keepp to work on VaU rather than VaUp. Based on /ip 
we can now define Galois insertions 

— between T^(Mem) and T^(Memp) 

— («p ,7^) between Val and Valp 

— {oip,jp) between Env and Envp 

— (ap,7p) between Store and Storep 

very much as before and obtain the following analogue of Theorem 1: 
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Theorem 2. If {TZ^p,K^p^MpF^y^pF,SpF) satisfies 

Rp-j ]\I^p e . Sp\ ^ Sp2 & 

then ( 7 ^ o W^p, 7 ^ o TZ^p, 7 “ o Mpp, 7 ^ ° WpF, 7 p ° Spp) satisfies 

lp{Rp)np{Mp) t> e : 7f(5'pi) ^ lp{Sp 2 ) & lp{Wp). 

As before it is a consequence of the above theorem that semantic correctness 
holds for the assumption set analysis as well. 



6 Conclusion 

We have shown how to express interprocedural and context-sensitive Data Flow 
Analysis in a syntax-directed framework that is reminiscent of Control Flow 
Analysis; thereby we have not only extended the ability of Data Flow Analysis 
to deal with higher-order functions but we also have extended the ability of 
Control Flow Analysis to deal with mutable data structures. At the same time 
we have used Abstract Interpretation to pass from the general mementoes of 
Section 3 to the more tractable mementoes of Section 5. In fact all our analyses 
are based on the specification of Tables 3 and 4. 
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pV- {c,cr) {c,a) 

P I- {x, o) (w, cr) if w = p{x) 



p h (fn,r X => e, a) {close (fn,r x => e) in p, a) 



p h (fun,r f X => e, a) ^ (close (fun,r f x => e) in p, a) 

p h (ei, (Ti) ^ (close (fn,r x => e) in p', a^), p\~ {ei.a-i) 0-3), 

p'[x 012] F (e, 0-3) ^ (w, 0-4) 
p\- ((ei e2)',cTi) ^ (01,0-4) 



p h (ei, o-i) ^ (close (fun,r f x => e) in p', 0-2), p F (62, 0-2) ^ (02, 0-3), 
p'[/ close (fun,r f x => e) in p']\x 02] F (e, 0-3) ^ (o, 0-4) 

pF ((ei e2)',o-i) ^ (0,0-4) 

p F (ei, 0-1) ^ (oi, 0-2), p F (62, 0-2) ^ (02, 0-3) 
p F (ei ; 62, 0-1) ^ (02, 0-3) 



pF (6,0-1) 
p F (ref,,^ 6, 0-1) 



(o,o'2) 

(t, 0-2 [t 



— ^ where i is the first unused location 
o]) 



P F (6,0-1) 
pF (l6,o-i) 



where o = 0-2(1) 

(0,(72) 



p F (ei, Q-i) ^ (t, 0-2), p F (62, 0-2) ^ (o, 0-3) 
pF(6i := 62, 0-1) ^ ((), 0-3(1 o]) 

p F (61, Q-i) ^ (oi, 0-2), p[x ^ oi] F (62, 0-2) ^ (02, 0-3) 
p F (let ® = 61 in 62, 0-1) ^ (02, 0-3) 



p F (6, 0-1) ^ (true, 0-2), p F (61, 0-2) ^ (o, 0-3) 
p F (if 6 then 61 else 62, 0-1) ^ (o, 0-3) 



p F (6, 0-1) ^ (false, 0-2), p F (62, 0-2) ^ (o, 0-3) 
p F (if 6 then 61 else 62, 0-1) ^ (o, 0-3) 



Table 7. Operational semantics. 



A Semantics 

The semantics is specified as a big-step operational semantics with environments 
p G Env and stores a G Store. The language has static scope rules and we give it 
a traditional call-by-value semantics. The semantic domains are: 

i G Loc = • • • (unspecified) 

uj G Val 

o ::= c I close (fn^ x => e) \n p\ close (fun^ f x => e) \n p \ l 
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R,M t> ref „ e : Si ^ S3 k W 

iff R, M > e-. Si^ S2 kW h {{m, (m, m)) | m e M} C iF' A 
Vm e M : S2 © {{zu,m),W) C S3 

R,M [> \e : Si ^ S2 k W 

\ff R,M > e : Si ^ S2 & IF A V(m, (rtj, md)) £ W : S2{vj,md) C (IF', M) 

R,M > ei := 62 : Si ^ S4 & IF 

iff R,M > 61 : Si ^ S2 & IFi A R, M > 62 : S2 ^ S3 & IF2 A 
{(m, d()) I m £ M] C IF A 

V(m, (ti7, md)) e IFi : (S3 0 (ti7, md)) © ((w, md), 1F2) C S4 
Table 8. Dealing with reference counts. 



p S Env = Var — >gn Val 
a G Store = Loc ^gn Val 

The set Loc of locations for references is left unspecified. The judgements of the 
semantics have the form 

p h (e,CTi) ^ (W,(J 2 ) 

and are specified in Table 7; the clauses themselves should be fairly straightfor- 
ward. (We should also note that the choice of big-step operational semantics is 
not crucial for the development.) 



B Reference Counts 



An obvious extension of the work performed here is to incorporate an abstract 
notion of reference count for dynamically created cells. In the manner of [27] we 
could change the definition of Store (in Table 1) to have 

S G Store = Cell — > (Val x Pop) 
p G Pop = {0, 1, M} 

Here the new Pop component denotes how many concrete locations may simul- 
taneously be described by the abstract reference cell: 0 means zero, I means at 
most one, and M means arbitrarily many (including zero and one). 

This makes it possible for the analysis sometimes to overwrite (as opposed to 
always augment) the value of a cell that is created or assigned. For this we need 
a new operation for adding a reference: 



S © ((tu, m), W) = S'[(rx7, m) 1 -^ {W" ,p")] 
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where 

{W ,p') = S{vo,m) 

/(VFUVF',M)ifyy^O 
> 1(W",I) ifp' = 0 

We also need a new operation for removing a reference: 



S © {w,m) = S[{w,m) i— > {W" ,p")] 



where 



{W',p') 



{W",p") 



S{vj, m) 

I (W',p') ifp'=M 

\(0,O) ifp'^M 



The necessary modifications to the analysis are shown in Table 8. 
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Abstract. This paper proposes an extensional semantics-based formal 
specification of secure information-flow properties in sequential programs 
based on representing degrees of security by partial equivalence relations 
(pers). The specification clarifies and unifies a number of specific cor- 
rectness arguments in the literature, and connections to other forms of 
program analysis. The approach is inspired by (and equivalent to) the use 
of partial equivalence relations in specifying binding-time analysis, and 
is thus able to specify security properties of higher-order functions and 
“partially confidential data”. We extend the approach to handle nonde- 
terminism by using powerdomain semantics and show how probabilistic 
security properties can be formalised by using probabilistic powerdomain 
semantics. 



1 Introduction 

1.1 Motivation 

You have received a program from an untrusted source. Let us call it company 
M. M promises to help you to optimise your personal financial investments, 
information about which you have stored in a database on your home computer. 
The software is free (for a limited time), under the condition that you permit a 
log-file containing a summary of your usage of the program to be automatically 
emailed back to the developers of the program (who claim they wish to determine 
the most commonly used features of their tool). Is such a program safe to use? 
The program must be allowed access to your personal investment information, 
and is allowed to send information, via the log-file, back to M. But how can you 
be sure that M is not obtaining your sensitive private financial information by 
cunningly encoding it in the contents of the innocent-looking log-file? This is an 
example of the problem of determining that the program has secure information 
flow. Information about your sensitive “high-security” data should not be able 
to propagate to the “low-security” output (the log-file) . Traditional methods of 
access control are of limited use here since the program has legitimate access to 
the database. 

This paper proposes an extensional semantics-based formal specification of 
secure information-flow properties in sequential programs based on representing 
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degrees of security by partial equivalence relations (pers^ ) . The specification clar- 
ifies and unifies a number of specific correctness arguments in the literature, and 
connections to other forms of program analysis. The approach is inspired by and, 
in the deterministic case, equivalent to the use of partial equivalence relations 
in specifying binding-time analysis [HS91], and is thus able to specify security 
properties of higher-order functions and “partially confidential data” (e.g. one’s 
financial database could be deemed to be partially confidential if the number of 
entries is not deemed to be confidential even though the entries themselves are). 
We show how the approach can be extended to handle nondeterminism, and 
illustrate how the various choices of powerdomain semantics affects the kinds of 
security properties that can be expressed, ranging from termination-insensitive 
properties (corresponding to the use of the Hoare (partial correctness) powerdo- 
main) to probabilistic security properties, obtained when one uses a probabilistic 
powerdomain. 



1.2 Background 

The study of information flow in the context of systems with multiple lev- 
els of confidentiality was pioneered by Denning [Den76,DD77] in an extension 
of Bell and LaPadula’s early work [BL76]. Denning’s approach is to apply a 
static analysis suitable for inclusion into a compiler. The basic idea is that 
security levels are represented as a lattice (for example the two point lattice 
PublicDomain < TopSecret) . The aim of the static analysis is to ensure that in- 
formation from inputs, variables or processes of a given security level only flows 
to outputs, variables or processes which have been assigned a higher or equal 
security level. 



1.3 Semantic Foundations of Information Flow Analysis 

In order to verify a program analysis or a specific proof a program’s security one 
must have a formal specification of what constitutes secure information flow. 
The value of a semantics-based specification for secure information flow is that 
it contributes significantly to the reliability of and the confidence in such activi- 
ties, and can be used in the systematic design of such analyses. Many approaches 
to Denning-style analyses (including the original articles) contain a fair degree 
of formalism but arguably are lacking a rigorous soundness proof. Volpano et al 
[VSI96] claim to give the first satisfactory treatment of soundness of Denning’s 
analysis. Such a claim rests on the dissatisfaction with soundness arguments 
based on an instrumented operational e.g., [0rb95] or denotational semantics 
e.g., [MS92], or on “axiomatic” approaches which define security in terms of a 
program logic [AR80] without any models to relate the logic to the semantics 
of the programming language. The problem here is that an “instrumented se- 
mantics” or a “security logic” is just a definition, not subject to any further 

^ A Partial Equivalence relation is symmetric and transitive bnt not necessarily re- 
flexive 
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mathematical justification. McLean points out [McL90] in a related discussion 
about the (non language-specific) Bell and LaPadula model: 

One problem is that . . . they [the Bell LaPadula security properties] 
constitute a possible implementation of security, . . . , rather than an ab- 
stract specification of what all secure systems must satisfy. By concerning 
themselves with particular controls over files inside the computer, rather 
than limiting themselves to the relation between input and output, they 
make it harder to reason about the requirements, . . . 

This criticism points to more abstract, extensional notions of soundness, based 
on, for example, the idea of noninterference introduced in [GM82] . 



1.4 Semantics-based Models of Information Flow 

The problem of secure information flow, or “noninterference” is now quite ma- 
ture, and very many specifications exist in the literature - see [McL94] for a 
tutorial overview. Many approaches have been phrased in terms of abstract, 
and sometimes rather ad hoc models of computation. Only more recently have 
attempts been made to rephrase and compare various security conditions in 
terms of well-known semantic models, e.g. the use of labelled transition systems 
and bisimulation semantics in [FG94]. In this paper we consider the problem 
of information-flow properties of sequential systems, and use the framework of 
denotational semantics as our formal model of computation. Along the way we 
consider some relations to specific static analyses, such as the Security Lambda 
Calculus [HR98] and an alternative semantic condition for secure information 
flow proposed by Leino and Joshi [LJ98]. 



1.5 Overview 

The rest of the paper is organised as follows. Section 2 shows how the per- 
based condition for soundness of binding times analysis is also a model of secure 
information flow. We show how this provides insight into the treatment of higher- 
order functions and structured data. Section 3 shows how the approach can be 
adapted to the setting of a nondeterministic imperative language by appropriate 
use of a powerdomain-based semantics. We show how the choice of powerdomain 
(upper, lower or convex) affects the nature of the security condition. Section 4 
focuses on an alternative semantic specification due to Leino and Joshi. Mod- 
ulo some technicalities we show that Leino’s condition - and a family of similar 
conditions - are in agreement with, and can be represented using our form of 
specification. Section 5 considers the problem of preventing unwanted proba- 
bilistic information flows in programs. We show how this can be solved in the 
same framework by utilising a probabilistic semantics based on the probabilistic 
powerdomain [JP89]. Section 6 concludes. 
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2 A Per Model of Information Flow 

In this section we introduce the way that partial equivalence relations (pers) can 
be used to model dependencies in programs. The basic idea comes from Hunts use 
of pers to model and construct abstract interpretations for strictness properties 
in higher-order functional programs [Hun90,Htin91]) and in particular its use to 
model dependencies in binding-time analysis [HS91]. Related ideas already occur 
in the denotational formulation of live- variable analysis [Nie90] . 

2.1 Binding Time Analysis as Dependency Analysis 

Given a description of the parameters in a program that will be known at par- 
tial evaluation time (called the static arguments), a binding-time analysis (BTA) 
must determine which parts of the program are dependent solely on these known 
parts (and therefore also known at partial evaluation time). The safety condi- 
tion for binding time analysis must ensure that there is no dependency between 
the dynamic (i.e., non-static) arguments and the parts of the program that are 
deemed to be static. Viewed in this way, binding time analysis is purely an 
analysis of dependencies.^ 

Dependencies in Security In the security field, the property of absence of un- 
wanted dependencies is often called noninterference, after [GM82]. Many prob- 
lems in security come down to forms of dependency analysis. For example, in the 
case of confidentiality, the aim is to show that the outputs of a program which 
are deemed to be of low confidentiality do not have any dependence on inputs 
of a higher degree of confidentiality. In the case of integrity (trust), one must 
ensure that the value of some trusted data does not depend on some untrusted 
source. 

Some intuitions about information flow Let us consider a program modelled 
as a function from some input domain to an output domain. Now consider the 
following simple functions mapping inputs to outputs: snd : Dx E ^ E for some 
sets (or domains) D and E, and shift and test, functions inNxN^NxN 
and N X N ^ N, defined by 

snd(x,y) = y 
shift(x, y) = (x-i- y, y) 
test(a;, y) = if a; > 0 then y else y -I- 1 

Now suppose that (h, 1) is a pair where h is some high security information, and 
I is low, “public domain”, information. Without knowing about what the actual 
values h and I might be, we know about the result of applying function snd 
will be a low value, and, in the case that we have a pair of numbers, the result 

^ Unfortunately, from the perspective of a partial evaluator, BTA is not purely a 
matter of dependencies; in [HS95] it was shown that the pure dependency models of 
[Lan89] and [HS91] are not adequate to ensure the safety of partial evaluation. 
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of applying shift will be a pair with a high first component and a low second 
component. 

Note that the function test does not enjoy the same security property that 
snd does, since although it produces a value which is constructed from purely 
low-security components, the actual value is dependent on the first component 
of the input. This is what is known as an indirect information flow [Den76]. 

It is rather natural to think of these properties as “security types” : 

snd : high x low low 
shift : high x low —>■ high x low 
test : high x low — > high 

But what notion of “type” , and what interpretation of “high” and “low” can 
formalise these more intuitive type statements? Interpreting types as sets of val- 
ues is not adequate to model “high” and “low” . To track degrees of dependence 
between inputs and outputs we need a more dynamic view of a type as a degree 
of variation. We must vary (parts of) the input and observe which (parts of) the 
output vary. For the application to confidentiality we want to determine if there 
is possible information leakage from a high level input to the parts of an output 
which are intended to be visible to a low security observer. We can detect this 
by observing whether the “low” parts of the output vary in any way as we vary 
the high input. 

The simple properties of the functions snd and shift described above can be 
be captured formally by the following formulae: 

Va:,a;',y. snd(a;,y) = snd(a;',y) (1) 

Va:, x', y. snd(shift(a;, y)) = snd(shift(a:', y)) (2) 

Indeed, this kind of formula forms the core of the correctness arguments for the 
security analyses proposed by e.g., Volpano and Smith et al [VSI96,SV98], and 
also for the extensional correctness proofs in core of the Slam-calculus [HR98]. 

High and Low as Equivalence Relations We show how we can interpret “security 
types” in general as partial equivalence relations. We will interpret high (for 
values in D) as the equivalence relation Allu, and low as the relation Ido where 
for all X, x' G D: 

X Alio x' (3) 

xldox' 4=^ X = x' . (4) 

For a function f : D ^ E and binary relations P G Rel{D) and Q G Rel{E), we 
write f : P -> Q iS 

Wx, X € D.x P X (/ x) Q (/ x). 

For binary relations P, Q we define the relation P x Q by: 

(x,y) P X Q {x',y') x P x' & y Q y' ■ 
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Now the security property of snd described by (1) can be captured by 

snd : Alio x Ido Ido, 



and (2) is given by 



shift : Allj^ X Idj^ — •> Allj^ x Idj^ 



2.2 From Equivalence Relations to Pers 

We have seen how the equivalence relations All and Id may be used to describe 
security “properties” high and low. It turns out that these are exactly the same as 
the interpretations given to the notions “dynamic” and “static” given in [HS91]. 
This means that the binding-time analysis for a higher-order functional language 
can also be read as a security information-flow analysis. This connection between 
security and binding time analysis is already folk-law (See e.g. [TK97] for a 
comparison of a particular security type system and a particular binding-time 
analysis, and [DRH95] which shows how the incorporation of indirect information 
flows from Dennings security analysis can improve binding time analyses) . 

It is worth highlighting a few of the pertinent ideas from [HS91]. Beginning 
with the equivalence relations All and Id to describe high and low respectively, 
there are two important extensions to the basic idea in order to handle struc- 
tured data types and higher-order functions. Both of these ideas are handled 
by the analysis of [HS91] which rather straightforwardly extends Launchbury’s 
projection-based binding-time analysis [Lau89] to higher types. To some extent 
[HS91] anticipates the treatment of partially-secure data types in the SLam cal- 
culus [HR98] , and the use of logical relations in their proof of noninterference. 

For structured data it is useful to have more refined notions of security than 
just high and low, we would like to be able to model various degrees of security. 
For example, we may have a list of records containing name-password pairs. As- 
suming passwords are considered high, we might like to express the fact that 
although the whole list cannot be considered low, it can be considered as a 
{low X high) list. Constructing equivalence relations which represent such prop- 
erties is straightforward - see [HS91] for examples (which are adapted directly 
from Launchbury’s work), and [Hun91] for a more general treatment of finite 
lattices of “binding times” for recursive types. 

To represent security properties of higher-order functions we use a less re- 
stricted class of relations than the equivalence relations. A partial equivalence 
relation (per) on a set D is a binary relation on D which is symmetric and 
transitive. If P is such a per let |P| denote the domain of P, given by 

\P\ = {x & D \ X P x} . 

Note that the domain and range of a per P are both equal to |P| (so for any 
x,y & D, \i X P y then x P x and y P y), and that the restriction of P to |P| 
is an equivalence relation. Clearly, an equivalence relation is just a per which 
is reflexive (so |P| = D). Partial equivalence relations over various applicative 
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structures have been used to construct models of the polymorphic lambda cal- 
culus (see, for example, [AP90]). As far as we are aware, the first use of pers in 
static program analysis is that presented in [Hun90]. 

For a given set D let Per{D) denote the partial equivalence relations over 
D. Per{D) is a meet semi-lattice, with meets given by set-intersection, and top 
element All. 

Given pers P S Per{D) and Q € Per{E), we may construct a new per 
{D -> E) G Per{D E) defined by: 

f {P -> Q) g 

Vx, x' G D. X P x' (/ x) Q {g x'). 

If P is a per, we will write x : P to mean x G |P|. This notation and the above 
definition of P -> Q are consistent with the notation used previously, since now 

f -.P^Q ^ f (P^Q) f 

4=^ Va;, x' G D.x P x' (/ x) Q (/ x'). 

Note that even if P and Q are both total (i.e., equivalence relations), P -> Q 
may be partial. A simple example is All -> Id. If / : All -> Id then we know 
that given a high input, / returns a low output. A constant function Xx.42 has 
this property, but clearly not all functions satisfy this. 

2.3 Observations on Strictness and Termination Properties 

We are interested in the security properties of functions which are the denota- 
tions of programs (in a Scott-style denotational semantics), and so there are some 
termination issues which should address. The formulation of security properties 
given above is sensitive to termination. Consider, for example, the following 
function f : Nx Nx 



f = Ax. if X > 0 then x else fx 

Clearly, if the argument is high then the result must be high. Now consider the 
security properties of the function g o f where g the constant function g = Ax. 2. 
We might like to consider that g has type high —>■ low. However, if function 
application is considered to be strict (as in ML) then g is not in 
Wn_l I since T 1 but g(T) yf g(l). Hence the function g o f does not have 

security type high — > low (in our semantic interpretation). This is correct, since 
on termination of an application of this function, the low observer will have 
learned that the value of the high argument was positive. 

The specific security analysis of e.g. the first calculus of Smith and Volpano 
[SV98] is termination sensitive - and this is enforced by a rather sweeping mea- 
sure: all “while” -loop conditions must be low and all “while” -loop bodies must 
be low commands. 

On the other hand, the type system of the SLam calculus [HR98] is not 
termination sensitive in general. This is due to the fact that it is based on a 
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call-by-value semantics, and indeed the composition g o f could be considered 
to have a security type corresponding to “/izg/i — > Zow” . The correctness proof 
for noninterference carefully avoids saying anything about nonterminating exe- 
cutions. What is perhaps worth noting here is that had they chosen a non-strict 
semantics for application then the same type-system would yield termination 
sensitive security properties! So we might say that lazy programs are intrinsically 
more secure than strict ones. This phenomenon is closely related to properties of 
parametrically polymorphic functions [Rey83]^. From the type of a polymorphic 
function one can predict certain properties about its behaviour - the so-called 
“free theorems” of the type [Wad89]. However, in a strict language one must add 
an additional condition in order that the theorems hold: the functions must be 
bottom-reflecting (f{a) = T a = T). The same side condition can be added 
to make the e.g. the type system of the Slam-calculus termination-sensitive. 

To make this observation precise we introduce one further constructor for 
pers. If i? G Per{D) then we will also let R denote the corresponding per on D± 
without explicit injection of elements from D into elements in We will write 
R^_ to denote the relation in Per{D±) which naturally extends i? by T i? T. 

Now we can be more precise about the properties of g under a strict (call- 
by-value) interpretation: g : {Allf^)± -o which expresses that g is a 

constant function, modulo strictness. More informatively we can say that that 
g : Wn which expresses that g is a non-bottom constant function. 

It is straightforward to express per properties in a subtype system of com- 
positional rules (although we don’t claim that such a a system would be in any 
sense complete). Pleasantly, all the expected subtyping rules are sound when 
types are interpreted as pers and the subtyping relation is interpreted as subset 
inclusion of relations. For the abstract interpretation presented in [HS91] this 
has already been undertaken by e.g. Jensen [Jen92] and Hankin and Le Metayer 
[HL94]. 

3 Nondeterministic Information Flow 

In this section we show how the per model of security can be extended to describe 
nondeterministic computations. We see nondeterminism as an important feature 
as it arises naturally when considering the semantics of a concurrent language 
(although the treatment of a concurrent language remains outside the scope of 
the present paper.) 

In order to focus on the essence of the problem we consider a very simplified 
setting - the analysis of commands in some simple imperative language contain- 
ing a nondeterministic choice operator. We assume that there is some discrete 
(i.e., unordered) domain St of states (which might be viewed as finite maps from 
variables to discrete values, or simply just a tuple of values). 

^ Not forgetting that the use of Pers in static analysis was inspired, in part, by Abadi 
and Plotkin’s Per model of polymorphic types [AP90] 
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3.1 Secure Commands in a Deterministic Setting 

In the deterministic setting we can take the denotation of a command C, written 
[C], to be a function in [Stj_ — > Stj_], where by [-Dj_ — > E±\ we mean the set of 
strict and continuous maps between domains D_l and Note that we could 
equally well take the set of all (trivially continuous) functions in St ^ Stj_, 
which is isomorphic. 

Now suppose that the state is just a simple partition into a high-security half 
and a low-security half, so the set of states is the product ?>thigh x Then 

we might define a command C to be secure if no information from the high part 
of the state can leak into the low part: 

C is secure 4=^ [C] : {All x Id)± -> {All x Id)± (5) 

Which is equivalent to saying that [C] : {All x Id) -o {All x Id)± since we only 
consider strict functions. Note that this does not imply that [C] terminates, but 
what it does imply is that the termination behaviour is not influenced by the 
values of the high part of the state. It is easy to see that the sequential com- 
position of secure commands is a secure command, since firstly, the denotation 
of the sequential composition of commands is just the function-composition of 
denotations, and secondly, in general for functions g : D ^ E and f : E ^ E, 
and pers P G Per{D), Q G Per{E) and R G Per{P) it is easy to verify the 
soundness of the inference rule: 

g-.P^Q f-.Q^R 
fog-.P^R 

3.2 Powerdomain Semantics for Nondeterminism 

A standard approach to giving meaning to a nondeterministic language - for 
example Dijkstra’s guarded command language - is to interpret a command as 
a mapping which yields a set of results. However, when defining an ordering on 
the results in order to obtain a domain, there is a tension between the internal 
order of Stx and the subset order of the powerset. This is resolved by considering 
a suitable powerdomain structure [Plo76,Smy78]. The powerdomains are built 
from a domain D by starting with the finitely generated (f.g.) subsets of D±^ 
(those non-empty subsets which are either finite, or contain _L), and a preorder 
on these sets. Quotienting the f.g. sets using the associated equivalence relation 
yields the corresponding domain. We give each construction in turn, and give an 
idea about the corresponding discrete powerdomain P[Stx]. 

— Lower (Hoare) powerdomain Let u v x & u.3y ^ v . x Q y. In this case 
the induced discrete powerdomain PciStx] is isomorphic to the powerset of 
St ordered by subset inclusion. This means that the domain [Stx ^ PrlStx]] 
is isomorphic to all subsets of St x St - i.e. the relational semantics. 

— Upper (Smyth) powerdomain The upper ordering on f.g. sets u, v, is given 

by 



U V 



\/y G v.3x G u. X U y. 
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Here the induced discrete powerdomain 7^u[Stj_] is isomorphic to the set of 
finite non-empty subsets of St together with Stj_ itself, ordered by superset 
inclusion. 

— Convex (Plotkin) powerdomain Let u u iff u v and u v. This is 
also known as the Egli-Milner ordering. The resulting powerdomain 7^c[St_L] 
is isomorphic to the f.g. subsets of Stj_, ordered by: 

A Cq B 4=^ either ± ^ A & A = B, 
or T G A & H\ {T} C H 

A few basic properties and definitions on powerdomains will be needed. For 
each powerdomain constructor V[—\ define the order-preserving “unit” map 
rjD '■ > 'P[D±] which takes each element a € D into (the powerdomain 

equivalence class of) the singleton set {a}. For each function / G [D_l — *■ 
there exits a unique extension of /, denoted f* where f* G — > ’P[L^_l]] 

which is the unique mapping such that 

f = r°v- 

In the particular setting of the denotations of commands, it is worth noting 
that [Ci; C 2 ] would be given by: 



[C'i;C2] = [C'2 Fo[C'i]. 



3.3 Pers on Powerdomains 

Give one of the discrete powerdomains, P[Stj_], we will need a “logical” way to 
lift a per P G Per{St±) to a per in Per(P[Stj_]). 

Definition 1 For each R G Per{D±) and each choice of power domain P)— ], 
let denote the relation on V[D±] given by 

A V[R] B ^ WaG A.3b€ B.a Rb 
& Wb G B.3a € A.a Rb 

It is easy to check that V[R] is a per, and in particular that V[IdD±] = W-p[Di]- 
Henceforth we shall restrict our attention to the semantics of simple com- 
mands, and hence the three discrete powerdomains P[Stj_]. 

Proposition 1 For any f G [Stj_ — > P[Stj_]] and any R, S € Per{St±), 

f :R^ V[S] ^ f* : V[R] V[S] 

From this it easily follows that the following inference rule is sound: 

[Cl] : P ^ V[Q] [C 2 ] : Q - V[R] 

[Ci;C2] :P^r[R] 
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3.4 The Security Condition 

We will investigate the implications of the security condition under each of the 
powerdomain interpretations. Let us suppose that, as before the state is parti- 
tioned into a high part and a low part: St = Sthigh x ^'tiow With respect to a par- 
ticular choice of powerdomain let the security “type” C : high x low high x low 
denote the property 



[C] : {All X Id)^ V[{All X Id) a]. 

In this case we say that C is secure. Now we explore the implications of this 
definition on each of the possible choices of powerdomain: 

1. In the lower powerdomain, the security condition describes in a weak sense 
termination-insensitive information flow. For example, the program 

if ft- = 0 then skip |] loop else skip 

(ft is the high part of the state) is considered secure under this interpretation 
but the termination behaviours is influenced by ft (it can fail to terminate 
only when ft = 0). 

2. In the upper powerdomain nontermination is considered catastrophic. This 
interpretation seems completely unsuitable for security unless one only con- 
siders programs which are “totally correct” - i.e. which must terminate on 
their intended domain. Otherwise, a possible nonterminating computation 
path will mask any other insecure behaviours a term might exhibit. This 
means that for any program C, the program C |] loop is secure! 

3. The convex powerdomain gives the appropriate generalisation of the deter- 
ministic case in the sense that it is termination sensitive, and does not have 
the shortcomings of the upper powerdomain interpretation. 

4 Relation to an Equational Characterisation 

In this section we relate the Per-based security condition to a proposal by Leino 
and Joshi [LJ98]. Following their approach, assume for simplicity we have pro- 
grams with just two variables: ft and I of high and low secrecy respectively. 
Assume that the state is simple a pair, where ft refers to the first projection and 
I is the second projection. 

In [LJ98] the security condition for a program C is defined by 

HH; C] HH = C; HH, 

where “=” stands for semantic equality (the style of semantic specification is 
left unfixed), and HH is the program that “assigns to ft arbitrary values” - 
aka “Havoc on H” . We will refer to this equation as the equational security 
condition. Intuitively, the equation says that we cannot learn anything about 
the initial values of the high variables by variation of the low security variables. 
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The postfix occurrences of HH on each side mean that we are only interested 
in the final value of 1. The prefix HH on the left-hand side means that the two 
programs are equal if the final value of I does not depend on the initial value of 
h. 

In relating the equational security condition to pers we must first decide 
upon the denotation of HH . Here we run into some potential problems since it 
is necessary in [LJ98] that HH always terminates, but nevertheless exhibits un- 
bounded nondeterminism. Although this appears to pose no problems in [LJ98] 
(in fact it goes without mention), to handle this we would need to work with 
non-w-continuous semantics, and powerdomains for unbounded nondeterminism. 
Instead, we side-step the issue by assuming that the domain of h, Sthigh, is finite. 



4.1 Equational Security and Projection Analysis 

A first observation is that the the equational security condition is strikingly 
similar to the well-known form of static analysis for functional programs known 
as projection analysis [WH87]. Given a function /, a projection analysis aims 
to find projections (continuous lower closure operators on the domain) a and f3 
such that 



(3 o f o a = (3 o f 

For (generalised) strictness analysis and dead-variable analysis, one is given (3, 
and a is to be determined; for binding time analysis [Lau89] it is a forwards 
analysis problem: given a one must determine some (3. 

For strict functions (e.g., the denotations of commands) projection analysis 
is not so readily applicable. However, in the convex powerdomain HH is rather 
projection-like, since it effectively hides all information about the high variable; 
in fact it is an embedding (an upper closure operator) so the connection is rather 
close. 



4.2 The Equational Security Condition Is Subsumed by the Per 
Security Condition 

Hunt [Hun90] showed that projection properties of the form (3ofoa = (3of could 
be expressed naturally as a per property of the form / : Ra -> Rf) for equivalence 
relations derived from a and (3 by relating elements which get mapped to the 
same point by the corresponding projection. 

Using the same idea we can show that the per-based security condition sub- 
sumes the equation specification in a similar manner. 

We will establish the following: 

Theorem 1. For any command C 

[HH] C] HH\ = [C] HH\ iff C : high x low high x low. 
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The idea will be to associate an equivalence relation to the function HH . 
More generally, for any command C let ker{C), the kernel of C, denote the 
relation on 7^[Stj_] satisfying 

Si ker{C) S 2 [CJs! = [C']s 2 . 

Define the extension of ker{C) by 

A ker*(C) B ^ {CfA = ICfB. 

Recall the per interpretation of the type signature of C. 

C : high x low — > high x low [(7] : {All x Id)_\_ -> 'P[{All x Id)±\. 

Observe that {All x Id)± = ker{HH) since for any h, I, h' , I' it holds {HH} {h, 1) = 
lHHj{h',l') iSl = l' iff {h,l){All X Id)jL{h',l'). 

The proof of the theorem is based on this observation and on the following 
two facts: 

— V[All X Id]± = ker*{HH) and 

- IHH] C; HH] = [C; HH] ^ [O] : ker{HH) ker*{HH). 

Let us first prove the latter fact by proving a more general statement similar 
to Proposition 3.1.5 from [Hun91] (the correspondence between projections and 
per-analysis) . Note that we do not use the specifics of the convex powerdomain 
semantics here, so the proof is valid for any of the three choices of powerdomain. 

Theorem 2. Let us say that a command B is idempotent iff[B; BJ = [R]. 
For any commands C and D, and any idempotent command B 

[B; C; Dj = [C; Dj ^ [C] : ker{B) ^ ker*{D) 

Corollary. Since [HHJ is idempotent we can conclude that 

[HH; C; HH] = [C; HH] ^ [C] : ker{HH) ker*{HH). 

It remains to establish the first fact. 

Theorem 3. V[All x Id]± = ker*{HH) 

The proofs are given in the full version of the paper [SS99] . Thus, the equa- 
tional and per security conditions in this simple case are equivalent. 

In a more recent extension of the paper, [LJ99], Leino and Joshi update 
their relational semantics to handle termination-sensitive leakages and intro- 
duce abstract variables — a way to support partially confidential data. Abstract 
variables h and I are defined as functions of the concrete variables in a program. 
For example, for a list of low length and high elements I would be the length of 
the list and h would be the list itself. In the general case the choice of h and I 
could be independent, so an independence condition must be verified. 

Abstract variables are easily represented in our setting. Suppose that some 
function g G St — > D yields the value (in some domain D) of the abstract low 
variable from any given state, then we can represent the security condition on 
abstract variables by: [C*] : Rg -> Vq[{AU x Id)±\ where siRgS^ 9 s\ = 

gs2- 
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5 A Probabilistic Security Condition 

There are still some weaknesses in the security condition when interpreted in 
the convex powerdomain when it comes to the consideration of nondeterministic 
programs. In the usual terminology of information flow, we have considered 
possibilistic information flows. The probabilistic nature of an implementation 
may allow probabilistic information flows for “secure” programs. Consider the 
program 



h := h mod 100; {I := h \\ I := rand(99)). 

This program is secure in the convex powerdomain interpretation since regardless 
of the initial value of h, the final value of I can be any value in the range {0 . . . 99}. 
But with a reasonably fair implementation of the nondeterministic choice and of 
the randomised assignment, it is clear that a few runs of the program, for a fixed 
input value of h, could yield a rather clear indiction of its value by observing 
only the possible final values of I, e.g., 2, 17, 2, 45, 2, 2, 33, 2, 97, 2, 8, 57, 2, 2, 66 , . . . 
from which we might reasonably conclude that the value of h was 2. 

To counter this problem we consider probabilistic powerdomains [JP89] which 
allow the probabilistic nature of choice to be refiected in the semantics of pro- 
grams, and hence enable us to capture the fact that varying the value of h causes 
a change in the probability distribution of values of 1. 

In the “possibilistic” setting we had the denotation of a command C to be 
a continuous function in [St_L — > T’c[Stj_]]. In the probabilistic case, given an 
input to C not only we keep track of possible outputs, but also of probabilities 
at which they appear. Thus, we consider a domain f [Stj_] of distributions over 
Stj_. The denotation of C is going to be a function in [Stj_ ^ £[Stj_]]. 

The general probabilistic powerdomain construction from [JP89] on an in- 
ductive partial order S[D] is taken to be the domain of evaluations, which are 
certain continuous functions on f2{D) [0, 1], where f2{D) is the lattice of open 

subsets of D. We will omit a description of the general probabilistic powerdo- 
main of evaluations since for the present paper it is sufficient and more intuitive 
to work with discrete domains, and hence a simplified notion of probabilistic 
powerdomain in terms of distributions. 

If S' is a set (e.g., the domain of states for a simple sequential language) 
then we define the probabilistic powerdomain of Sj_, written S[Sj_] to be the 
domain of distributions on Sj_ , where a distribution /i, to be a function from Sj_ 
to [0, 1] such that = 1- The ordering on S[Sj_] is defined pointwise by 

/i < iff Vc? yf T. /id < i^d. This structure is isomorphic to Jones and Plotkin’s 
probabilistic powerdomain of evaluations for this special case. 

As a simple instance of the probabilistic powerdomain construction from 
[JP89], one can easily see that f [S'] is an inductively complete partial order with 
directed lubs defined pointwise, and with a least element oj = ?/s(T), where rjs 
is the point-mass distribution defined for an a; € S by 



rjs{x)d 



1, if d = X, 

0, otherwise. 
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To lift a function / : _Di — > S[D- 2 \ to type S[Di] — > S[D 2 ] we define the extension 
of /by 

xGDi 

The structure {£[D], rju{x), *) is a Kleisli triple, and thus we have a canonical 
way of composing the probabilistic semantics of any two given programs. Suppose 
/ : Di — > £[ 02 ] and g : £>2 — > £[D 3 ] are such. Then the lifted composition 
{g* o f)* can be computed by one of the Kleisli triple laws as g* o f* . 

The next step towards the security condition is to define how pers work on 
discrete probabilistic powerdomains. To lift pers to £[D] we need to consider a 
definition which takes into consideration the whole of each i?-equivalence class in 
one go. The intuition is that an equivalence class of a per is a set of points that are 
indistinguishable by a low-level observer. For a given evaluation, the probability 
of a given observation by a low level user is thus the sum of probabilities over 
all elements of the equivalence class. 

Define the per relation £[R] on £[D] for g,,v £ £[D] by 

pL £[R\v itiyd £ \R\. ^ pLe= ^ ve, 

ee[d]K ee[d]K 

where [d]n stands for the i?— equivalence class which contains d. Naturally, 
^ £[Id] V 4=^ n = ly and V^, v S £[D], p, £[All] v. 

As an example, consider £[{All x Id)±\. Two distributions p and u in {All x 
Id)± —>■ [0, 1] are equal if the probability of any given low value I in the left-hand 
distribution, given by p{h,l), is equal to the probability in the right-hand 
distribution, namely v{h, 1). 

The probabilistic security condition is indeed a strengthening of the possi- 
bilistic one - when we consider programs whose possibilistic and probabilistic 
semantics are in agreement. 

Theorem 4. Suppose we have a possibilistic (convex) semantics [-]c and a 
probabilistic semantics [•]£, which satisfy a basic consistency property that for 
any command C, if > 0 then o e [CJc*- 

Now suppose that R and S are equivalence relations on D. Suppose fur- 
ther that C is any command such that possibilistic behaviour agrees with its 
probabilistic behaviour, i.e., o G [(FJc* > 0. Then we have that 

[cy : R ^£[5] implies [Cjc : R ^ Vc[S]. 

In the case that the state is modelled by a pair representing a high variable 
and a low variable respectively, it is easy to see that a command C is secure 
([Clf: : {All X Id)± ~^£[{All x Id)±]) if and only if 

[Cje{ih, ii) -L = [Cje{i'hAi) -L and 

J2hestkigk^^''^£(^h,ii){h,oi) = ^hestMgh^^''^£(^'hAi){h,oi) 

for any and o/. Intuitively the equation means that if you vary ih the 

distribution of low variables (the sums provide “forgetting” the highs) does not 
change. 




A Per Model of Secure Information Flow in Sequential Programs 



55 



Let us introduce probabilistic powerdomain semantics definitions for some 
language constructs. Here we omit the f-subscripts to mean the probabilistic 
semantics. Given two programs C\,Ci such that [Ci] : Stj_ — > £[Stj_] and 
[C 2 ] : Stn — > the composition of two program semantics is defined by: 

[Ci; € 2 ] i 0 = ([Cl] i s) * ([C2] s o). 

sGSt± 

The semantics of the uniformly distributed nondeterministic choice Ci [| Ci is 
defined by [Ci[]C 2 ] i o = 0.5[Ci] i o+0.5[C2] i o. Consult [JP89] for an account 
of how to define the semantics of other language constructs. 

Example. Recall the program 

h := h mod 100; {I := h \\ I := rand(99)) 

Now we investigate the security condition by varying the value of h from 0 to 1. 
Take z/ = 0, Zft = 0, Z/j = 1 and o/ = 0. The left-hand side is 

Y 0)(^: 0) = 0.5 * 1 -k 0.5 * 0.01 = 0.505, 

Zie[o,... , 100 ] 

whereas the right-hand side is 

Y 0)(^: 0) = 0.5 * 0 -k 0.5 * 0.01 = 0.005. 

Zie[o,... , 100 ] 

So, the security condition does not hold and the program must be rejected. 

Volpano and Smith recently devised a probabilistic security type-system 
[VS98] with a soundness proof based on a probabilistic operational semantics. 
Although the security condition that they use in their correctness argument 
is not directly comparable - due to the fact that they consider parallel deter- 
ministic threads and a non-compositional semantics - we can easily turn their 
examples into nondeterministic sequential programs with the same probabilis- 
tic behaviours. In the extended version of this paper [SS99] we show how their 
examples can all be verified using our security condition. 

6 Conclusions 

We have developed an extensional semantics-based specification of secure infor- 
mation flow in sequential programs, by embracing and extending earlier work 
on the use of partial equivalence relations to model binding times in [HS91]. We 
have shown how this idea can be extended to handle nondeterminism and also 
probabilistic information flow. 

We recently became aware of work by Abadi, Banerjee, Heintze and Riecke 
[ABHR99] which shows that a single calculus (DCC), based on Moggi’s com- 
putational lambda calculus, can capture a number of specific static analyses for 
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security, binding-time analysis, program slicing and call-tracking. Although their 
calculus does not handle nondeterministic language features, it is notable that 
the semantic model given to DCC is Per-based, and the logical presentations of 
the abstract interpretation for Per-based BTA from [HS91,Jen92,HL94] readily 
fit this framework (although this specific analysis is not one of those considered in 
[ABHR99]). They also show that what we have called “termination insensitive” 
analyses can be modelled by extending the semantic relations to relate bottom 
(nontermination) to every other domain point (without insisting on transitiv- 
ity). It is encouraging to note that - at least in the deterministic setting - this 
appears to create no technical difficulties. We do not, however, see any obvious 
way to make the probabilistic security condition insensitive to termination in a 
similar manner. 

We conclude by considering a few possible extensions and limitations: 

Multi-level security There is no problem with handling lattices of security 
levels rather than the simple high-low distinction. But one cannot expect to 
assign any intrinsic semantic meaning to such lattices of security levels, since 
they represent a “social phenomenon” which is external to the programming 
language semantics. In the presence of multiple security levels one must simply 
formulate conditions for security by considering information flows between levels 
in a pairwise fashion (although of course a specific static analysis is able to do 
something much more efficient). 

Downgrading and Trusting There are operations which are natural to 
consider but which cannot be modelled in an obvious way in an extensional 
framework. One such operation is the downgrading of information from high to 
low without losing information - for example representing the secure encryption 
of high level information. This seems impossible since an encryption operation 
does not lose information about a value and yet should have type high —>■ low 
- but the only functions of type high — > low are the constant functions. An 
analogous problem arises with 0rbaek and Palsberg’s trust primitive if we try to 
use pers to model their integrity analysis [0P97]. 

Concurrency Handling nondeterminism can be viewed as the main step- 
ping stone to formulating a language-based security condition for concurrent 
languages, but this remains a topic for further work. 
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Abstract. Def, the domain of definite Boolean functions, expresses 
(sure) dependencies between the program variables of, say, a constraint 
program. Share, on the other hand, captures the (possible) variable shar- 
ing between the variables of a logic program. The connection between 
these domains has been explored in the domain comparison and decom- 
position literature. We develop this link further and show how the meet 
(as well as the join) of Def can be modelled with efficient (quadratic) op- 
erations on Share. Further, we show how by compressing and widening 
Share and by rescheduling meet operations, we can construct a depen- 
dency analysis that is surprisingly fast and precise, and comes with time- 
and space- performance guarantees. Unlike some other approaches, our 
analysis can be coded straightforwardly in Prolog. 

Keywords. (Constraint) logic programs, abstract interpretation, data- 
flow analysis, dependency analysis, definite Boolean functions, widening. 



1 Introduction 

Many analyses for logic programs, constraint logic programs and deductive 
databases use Boolean functions to express dependencies between program vari- 
ables. In groundness analysis [2,4,10,20,26], the formula x A {y ^ z) describes a 
state in which x is definitely ground, and there exists a grounding dependency 
such that whenever z becomes ground then so does y. Other useful properties 
like definiteness [5,21], strictness [19], and finiteness [6] can be also expressed 
and inferred with Boolean functions. Different classes of Boolean functions have 
different degrees of expressiveness. For example, Pos, the class of positive propo- 
sitional formulae, has the condensing [1] property and is rich enough for goal- 
independent analysis. Def, the class of definite positive propositional formulae, 
is less expressive [1] but has been proposed for goal-dependent analysis of con- 
straint programs [21]. 

The objective behind this work was to construct a goal-dependent ground- 
ness (and definiteness) analysis for logic (and constraint) programs, that was 
fast and precise enough to be practical, maintainable and easy to integrate into 
a Prolog compiler. Binary Decision Diagrams (BDD’s) [7] (and their derivatives 
like ROBDD’s) are the popular choice for implementing a dependency analysis 
[1,2,4,20,26]. These are essentially directed acyclic graphes in which identical 
sub-graphes are collapsed together. BDD operations require pointer manipula- 
tion and dynamic hashing [20] and thus BDD-based Pos analyses are usually 
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implemented in C [1,2,4,26]. Fecht [20] describes a notable exception that is 
coded in ML. The advantage of using ML is that it is more declarative than C 
and therefore easier to maintain. The disadvantage is that it impedes integration 
into a Prolog compiler [25]. The ideal, we believe, is to implement a dependency 
analyser in ISO Prolog. The problem, then, is essentially one of performance. 

Our contribution to solving this problem is as follows: In terms of precision, 
we provide the first systematic precision experiments that compare Pos and Def 
for goal-dependent groundness (and definiteness) analysis. We found that Def 
was as precise as Pos for all our realistic Prolog and CLP(T^) benchmarks. We 
build on this and demonstrate how Def can be implemented efficiently and coded 
succinctly in Prolog. Our starting point is the work of Cortesi et al [15,16] that 
shows that Share, which is a domain whose elements are sets of sets of variables, 
can be used to encode Def. We develop this to show: 

— how the meet and join operations of Def can be computed straightforwardly 
based on this encoding, without the closure operation of Share [22] that has 
a worst-case exponential complexity; 

— how an operation (that we call compression) aids fixpoint detection; 

— how meet operations can be rescheduled to improve efficiency; 

— how widening can be applied to ensure that both the time-complexity of the 
analysis (the number of iterations) and the space-complexity (the number of 
sets of variables), grows linearly in the size of the program; 

— that the speed of our analysis compares surprisingly well against state-of- 
the-art BDD-based Pos analysers [4,20]. 

The rest of the paper is structured as follows. Section 2 surveys the neces- 
sary preliminaries. Section 3 recalls the relation between Share and Def and is 
included so that the paper is self-contained. Section 4 shows how the meet and 
join operations of Def can be computed efficiently using a Share based represen- 
tation. Section 5 introduces compression and meet scheduling whereas Section 6 
discusses widening. Section 7 describes the implementation. Section 8 reviews 
the related work, and finally Section 9 presents our conclusions. 

2 Preliminaries 

In this section, we introduce some notation and recall the definitions of Boolean 
functions and the domain Share. For a set S, [S'] denotes the cardinality and 
p(S') the powerset of S. Var denotes a denumerable set (universe) of variables 
and X C Var denotes a finite set of variables; the set of variables occurring in a 
syntactic object o is denoted by uar(o); the set of all idempotent substitutions 
is denoted by Suh] and Bool is defined to be {true, false}. 

If {S, :<) is a poset with top and bottom elements, and a meet sqcap and 
join U, then the 4-tuple (S', ^,n,U) denotes the corresponding lattice. A map 
g : L ^ K, where L and K are lattices, is a homomorphism iff g is join-preserving 
and meet-preserving, that is, g{a U 6) = g{a) U g{b) and g{a □ 6) = g{a) □ g{b) for 
all a,b G L. An isomorphism is a bijective homomorphism. 
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2.1 Boolean Functions 

A Boolean function is a function / : Bool”' — > Bool where n > 0. A Boolean 
function can be represented by a propositional formula over X where |A| = n. 
The set of propositional formulae over X is denoted by Boolx- We use Boolean 
functions and propositional formulae interchangeably without worrying about 
the distinction [1]. We follow the convention of identifying a truth assignment 
with the set of variables that it maps to true. 

Definition 1 (modelx)- The (bijective) map modelx ■ Boolx p{p{^)) is 

defined by: modelxif) = {M C X \ (AM) A V X\M) \= /}. I 



Example 1. If A = {x, y\, then the function {{true, true) true, {true, false) 
1 -^ false, {false, true) false, {false, false) i— > false} can be represented by 
X Ay. Also modelx{x Ay) = {{a;, y}} and modelx{x V y) = {{a;}, {y}, {a;, y}}. I 



Definition 2 {Posxi Def xi Manx)- Posx is the set of positive Boolean func- 
tions over X. A function / is positive iff A e modelxif)- Def x is the set of 
positive functions over A that are definite. A function / is definite iff M n 
M' G modelxif) ^ii M,M' G modelxif)- Monx is the set of monotonic 
Boolean functions over A. A function / is monotonic iff M G modelxif) implies 
M' G modelxif) for all M' such that M C M' C A. I 

Note that Def x ^ Posx and Monx % Posx- It is possible to show that each 
/ G Def X is equivalent to a conjunction of definite (propositional) clauses, that 
is, / = A(bi(y,^AA,) [18], 

Example 2. Suppose A = {x,y,z\ and consider the following table, which states, 
for some Boolean functions, whether they are in Def x, Posx or Monx, and also 
gives model X- 



f 


Def X Posx Monx 


modelxi f) 


false 
X Ay 
a; V y 
x^y 
xP {y ^ z) 
true 


• 

• • • 

• • 
• • 

• 

• • • 


0 

{ U,y}, ja;,y, 

{ {a;}, {y}, \x,y}, {x,z},{y,z}, {x,y,z\] 

|0, |a;|, {z}, ja;, y|, (a;, z\, \x, y, 

{0, ja;}, jyl, ja;, yj, ja;, z}, {y, z\, ja;, y, z}| 

{0, {a;}, {yj, [z], {a;, y}, {a;, z}, [y, z}, [x, y, z}} 



Note, in particular, that xPyis not in Def x (since its set of models is not closed 
under intersection) and that false is neither in Posx nor Def x- I 

Defining / 1 V /2 = A{/ G Def x I /i h /A/2 h /}, the 4-tuple {Def x, h, A, V) is 
a finite lattice [1], where true is the top element and A A is the bottom element. 
Existential quantification is defined by Schroder’s Elimination Principle, that is, 
3a;./ = f[x ^ true] V f[x 1 -^ false]. Note that 3a;./ G Def x if / G Def x [!]• 

Example 5. If A = {x,y} then xV{x ^ y) = A{(a; ^ y),true} = (a; <— y), 
as can be seen in the Basse diagram for Def x (Fig. 1). Note also that xWy = 
A{true} = true yf (a; V y). I 
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Sharef^^^yy/= 



Fig. 1. Hasse diagrams 



The maximum number of iterations of a fixpoint analysis relates to the length of 
the longest ascending chain in the underlying domain. For Posx, it is well-known 
that the longest chain has length 2” — 1 where |AT| = n. It is less well-known 
that the same holds for Def x- 

Proposition 1. Let lA"! = n. Let fi \= f 2 ■ ■ ■ \= fk & maximal strictly 
ascending chain where ft € Def x for alH G {1, . . . , fc}. Then fc = 2”. I 



2.2 Sharing Abstractions 

For completeness, we introduce the basic ideas behind the Share domain [22]. 
This domain traces the possible variable sharing behaviour of a logic program. 
Two variables share if they are bound to terms that contain a common variable. 

Definition 3 (Sharex)- Sharex = p{p{X) \ {0}). I 

Thus we have the finite lattice {Sharex, C, n, U). The top element is p(A) \ {0} 
and the bottom element is 0. 

Definition 4 5 7x )• The abstraction map • p{Sub) Sharex is de- 

fined as a^xi^) = {occ{6,v)f]X | 0 G 6>Au G For}\{0} where occ{9,v) = {a; G 
Var I V G var{9{x))}. The concretisation map : Sharex p{Sub) is defined 
asj^^{S) = {9&Sub\a’${{9})CS}. I 

To streamline the theory and reduce the size of abstractions, the empty set 
is never included in a share set. However there is some loss of information. 
That is, if every element of O maps every element of A to a ground term then 
ax(6>) = {0} \ {0} = 0 = ( 0 )- Thus (and hence Jx) cannot distinguish 

between a set of ground substitutions and the empty set. In practice, the empty 
set only arises when a computation fails and this would normally be flagged 
elsewhere in the analyser [9]. 
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Example 4- Let X = {x, y, z} and consider abstracting (S) x = f{y, z) ® where 

at program point (a), no variable in X is ground or shares with any other element 
of X. The bindings on X, for example, could be 9a or da as given below. Then 

the bindings at ® would be 6h or 9b, respectively. 

0a = g(u),z w} 9b = {xi-^ f(g{u),v),y g(u),z ^ w} 

9a = {x ^ f{u,u)} 9b = {x^ f(y,y),z ^ y} 

The abstraction Sa = {{a;}, {y}, { 2 }} describes 6a, that is 9a € j^(Sa), since 
occ(9a,x) = {a;}, occ(9a,u) = {u,y}, occ(9a,v) = {v,z} and occ{9a,y) = 
occ{9a,z) = 0. Similarly 9a € Yxi^a)- The abstract unification operation of Ja- 
cobs and Langen [22] will compute the abstraction Sb = {{a;, y}, {a;, z}, {x, y, z}} 
for the program point ®. A safety result of Jacobs and Langen [22] asserts that 
9b, 9b G 7 j/*(S'h). Indeed, we see that 9b G 7 x (-Sh) since occ{9b,u) = {u,x,y}, 
occ{9b,v) = {v,x,z}, and occ(9b,x) = occ(9b,y) = occ(9b,z) = 0. The reader is 
encouraged to verify that 9b G 7 j/*(S'h). I 

3 Quotienting Sharex to obtain Def x 

In this section we construct a homomorphism from Sharex to Def x- We recall 
the well-known connection between Sharex and Def x [13,14,15,16]. For the 
elements of Sharex, we define an abstraction ax which interprets a sharing 
abstraction as representing a set of models and hence a Boolean function. 

Definition 5 (ax)- The (abstraction) map ax : Sharex Def x is defined as 

follows: ax(5) = modelx\{X \ (US") | S' C S'}). I 

The definition of ax is essentially that of a of Cortesi et al [14, Section 8.4], 
adapted to our definition of Sharex- otx is well-defined, that is, ax{S) G Def x 
for all S G Sharex- First, since X G model x{oix{S)), it follows that ax{S) G 
Posx- Secondly, if Mi, M 2 G model x{oix{S)) then Mi = A\(USi) where Si C S 
(i = 1, 2). Clearly Si U S 2 C S. As Mi n M 2 = X \ (U(Si U S 2 )), it follows that 
Ml n M 2 G model x{oix{S)) - 

Lemma 1. ax is surjective. I 

However, ax is not injective, and thus it is a strict abstraction of Sharex- As 
an example, consider X = {a;, y|. Si = {{x}, {y}} and S 2 = Si U {{a;, y}}. Then 
ax(<S'i) = modelx~^{{^,{x},{y},{x,y}}) = ax(S 2 ) but Si yf 32- 

Example 5- Let X = {x,y,z} and S = {Gi,G 2 ,G 3 | where Gi = {a;}, G 2 = 
{y,z} and G 3 = {z}- The table illustrates how ax{S) can be computed by 
enumerating US' and X \ (US') for all S' C S. 



S' 


US' 


X \ (US') 


S' 


US' 


A\ (US') 


0 


0 


{x,y, z} 


{G 3 I 


14 


{a;, y} 


{Gil 


{a;} 


{y, z} 


{Gi,G3l 


4,4 


{4 


{G2} 


{y^ 4 


{a;} 


{G2,G3} 


{y, 4 


(4 


{Gi,G2| 


{x,y, z} 


0 


{Gi,G2,G3| 


{x,y, z} 


0 



Thus ax{S) = modelx ^{0, {a;}, {y|, {a;, y|, {y, z|, {a:, y, z}}) = (y ^ z)- The 
reader is encouraged to verify that ax(0) = AA and ax({{a^| | x G X})=true- I 
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It is perhaps easier to interpret an abstraction of Share x as definite Boolean 
functions by using the C : Sharex Def x abstraction map of Cortesi et al 

[15,16]. C can be expressed particularly succinctly using the auxiliary operation 
rel which, given a set of variables G and an S' G Sharex, selects the subset of S 
which is relevant to the variables of G. 

Definition 6 (rel). The map rel : p{X) x Sharex Sharex is defined by: 

rel{Y,S) = {G G S \GnY | 

Definition 7. The map C : Sharex Def x is defined by C{S) = AF where 

F={y^AY\yeX AY CX\{y} A rel{{y}, S) C rel{Y, S)}. | 

F is defined with Y Q X \ {y} rather than Y C X to keep its size manageable. 

Example 6. Consider again Example 5. The set of T X \ {a;} such that 
rel{{x}, S) C rel{Y, S) is {{a;}, {a;, y}, {a;, z}, {x, y, z}}. Likewise, set of F Q X\ 
{y} such that rel({y}, S) C relfY, S) is {{y}, {z}, {x, z}, {x, y}, {y, z}, {a;, y, z}}. 
Finally, the set of F Q X \ {z} such that rel{{z},S) C rel(Y,S) is {{z}, 
{x,z},{y,z},{x,y,z}}. Thus C(S) = (y ^ z). I 

The following proposition asserts the equivalence of C and ax- It is proven by 
Cortesi et al [14], albeit for slightly different definitions. Modifying their proof 
to our definitions is straightforward. 

Proposition 2. C = ax- I 

By defining S' = S" iff ax{S) = ax (S'), ax induces an equivalence relation 
on Sharex which quotients Sharex- Using the closure under union operation 
of Jacobs and Langen [22], we obtain a useful lemma about these equivalence 
classes. 

Definition 8. Let S G Sharex- Then the closure under union S* of S is defined 
&y.- S* = {US' I S' C S}\{0}. I 

Note that closure under union is exponential. 

Lemma 2. Let Si, S 2 G Sharex- Then Si* = Si and Si = S 2 iff S{ = SJ I 

We lift ax to ax ■ Sharex /=—^ Def x by defining ax([S]=) = ax{S)- Since 
ax ■ Sharex Def x is surjective it follows that ax ■ Sharex /=^ Def x is 

bijective. We now define, for the the operations V and A on Def x, analogous 
operations U,U and □ on Sharex /=- 

Definition 9 (C, U, □). 

[Si]= E [S2]= ^ ax([Si]=) [= ax([S2]=) 

[Si]_ U [S2]= = ax~^(o:x([Si]=) V ax([S2]=)) 

[Si]= n [S2]= = ax~^(o:x([Si]=) A ax([S2]=)) I 

Proposition 3. {Sharex /=, Ej H, U) is a finite lattice. I 

It follows by construction that ax is an isomorphism. For the dyadic case, the 
isomorphism is illustrated in Fig. 1. 
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4 Computing the join and meet within Share 

In this section we show how the meet (as well as the join) of Def x can be 
computed with Share x /= via the isomorphism. It is not obvious from the defi- 
nition of V how / 1 V /2 is computed, and it turns out that fi and /2 must be put 
into (orthogonal) reduced monotonic body form [1]. In contrast, it is well-known 
[15,16] that with the Share representation, join basically reduces to set union. 

Proposition 4. [^i]^ U [52]= = [5i U ^ 2 ]= I 

Example 1. Consider calculating [5i]= U [ 52 ]= where X = {w, x,y, z}, Si = 
{{w, X, y}, {a;, y}, {y}, {z}} and S 2 = {{w, z}, {a;}, {y}, {z}}. Note that ax(5i) = 
(w ^ a;) A (a; <— y) and ax{S 2 ) = w ^ z. Then ax{Si U S 2 ) = ax{{{w, x, y}, 
{w, z}, {a;}, {a;, y}, {y}, { 2 }}) = [w ^ {x ^ z)) A (w ^ (y A z)) as required. I 

The challenge is in defining a computationally efficient meet. This is defined 
in terms of a map iff which, in turn, is defined in terms of the binary-union 
operation of Jacobs and Langen [22]. We follow Cortesi et al [16] and denote 
binary union as 0. 

Definition 10 (binary-union, ®). The map 0 : Share x^ Share x is defined 

by: Si<S)S 2 = {Gi U G 2 I Gi e 5i A G 2 G ^ 2 }. I 

The if and iff maps defined below are similar to the classical abstract unification 
operation of Jacobs and Langen [22]. Their interpretation, however, is that given 
variable sets Yi and Y 2 and an abstraction S such that ax{S) = /, iff and if 
compute new abstractions that represent /A(AYi ^ AY 2 ) and /A(AYi <— AF 2 )- 

Definition 11. The two maps iff : p{X) x p{X) x Sharex — *■ Sharex and if : 
p{X) X p{X) X Sharex — *■ Sharex are defined by: iff(Yi,Y 2 ,S) = 
(5 \ {Si U 52 )) U {Si O S 2 ) and if{Yi,Y 2 ,S) = {S \ Si) U {Si 0 ^ 2 ) where 
5i = rel{Yi,S) and ^2 = rel{Y 2 , S). I 

One important difference between iff and if on the one hand and the abstract 
unification algorithm of Jacobs and Langen [22] on the other hand is that iff 
and if involve no costly closure calculations that arise because of the transitivity 
of variable sharing. Consequently the complexity iff and if is not exponential in 
the number of variable sets in S, but quadratic. This is a similar efficiency gain 
to that obtained with the Share pair-sharing quotient of Bagnara et al [3] . 

Proposition 5. ax{iff {Yi,Y 2 , S)) = ax{S) A (Aid ^ AY 2 ) I 

Corollary 1. ax{if{Yi,Y 2 , S)) = ax{S) A (Aid ^ Aid) I 

Even though z/(ld. Id, S) can be simulated with Zjff (Id', Id, 5) where Id' = Id U 
Id, it is cheaper to compute rel{Yi, S) than rel{Yi , S). This is one reason why 
z/(ld. Id, S) is more efficient than iff{Yi, Id, 5). The map if is particularly useful 
in the analysis of constraint logic programs, where a constraint like x = y + z is 
abstracted by (a; <— (y A z)) A {y ^ {x A z)) A {z ^ {x A y)). 

Projection is an important component of a Eef analysis within itself [21]. 
For completeness, we state its correctness as a proposition. 
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Definition 12 (projection 3). The map 3 : p{X) x Share x Share x is 

defined by: 3Y.S = {G n F | G G S'} \ {0}. I 

Proposition 6. If F C AT then 3(AT \ y).ax([S]=) = oy ([3y.S] = ). I 

Finally, Theorem 1 shows how meet can be computed with a sequence of iff 
calculations. 

Theorem 1. [Si] = n[S 2 ]= = where X = {x\, . . x„}, S[ = p(Si)U 

S 2 , S'_|_]^ = iff {{p{xj)}, {xj}, S') for j G {1, . . . , n| and p is a renaming such that 
p{X) n X = 0. I 

Note that [Si]= 13 [S 2 ]= could also be computed by [Si]= 3 [S 2 ]= = [Si* 3 S 2 *] = . 
This, however, would be inefficient. 

Example 8. Consider calculating [Si]=3[S2]= where X = {w, x, y\, S\ = {{w, a;}, 
{a;},{y}} and S 2 = {{w|, {a;, y}, {y}}. Thus ax(Si) = w ^ x and ax{S 2 ) = 
X ^ y.li p = {w ^ w' , X ^ x' ,y ^ y'j then 




Thus [Si]= 3[S2]= = [33f.S4']= = {{w, x, y|, {a;, y|, {y}}. Observe that ax([Si]=3 
[S 2 ] = ) = (w <— a;) A (a; ^ y) as required. I 

5 Representing eqnivalence classes and meet rescheduling 

In our analysis, the functions / and f would be represented by elements of 
Sharex, S and S', say. The fixpoint stability check, / = /', amounts to check- 
ing whether [S]= = [S']= which, in turn, reduces to deciding whether ax{S) = 
ax {S'). To make this test efficient we represent an equivalence class by its small- 
est representative and thus introduce a compression operator c. 

Definition 13. c : Sharex Sharex is defined by: c(S) = 3{S' | S' = S|. I 

The following proposition explains how c(S) is actually computed. 

Proposition 7. Let n = |3f|. Then c{S) = S„ where Si = {G G S | |G| = 1} 
and Sj+i = Sj U {G G S ||G| = j -k 1 A G ^ S*|. I 

Trivially, if S = S', then c(S) = c(S'). From the proposition we also see that 
S* = S„* = c(S)* and hence if c(S) = c(S') then S* = c(S)* = c(S')* = S'* 
so that S = S' by Lemma 2. Hence c(S) = c(S') iff S = S' and thus by testing 
whether c(S) = c(S') we can check for the fixpoint condition S = S'. 

When computing c(S) we can test whether G ^ Sff without actually com- 
puting Sj* as follows. Suppose Sj = {Gi, . . . , Gm} and Go' = G. Then compute 
G/ = Gi_i' \ Gi if Gi C G and put Gi = Gi_i' otherwise. Then Gm = 0 iff 
G G Sj*. Using this tactic we can compute c(S) in quadratic time. 

Projection can sometimes lead to abstractions that include redundant vari- 
able sets as is illustrated below. 
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Example 9. Consider S = {{x} , {y\ , {x , y , z\} which, incidentally, represents 
ax{S) = (z <— a;) A (z <— y). Projecting onto {x,y} like so 3{a;,7/}.S' = 
{{a;}, {y}, {a;, y}} introduces the set {a;, y}, whereas c(3{a:, y}.S) = {{a;}, {y}}. I 



Compression is only applied to check for stability. In our framework, however, 
projection always precedes a stability check. For example, the answer pattern for 
a clause is obtained by projecting onto the head variables, and then a stability 
check is applied to see if other clauses need to be re-evaluated. Thus, in our 
framework, compression is applied after projection. Compression could be ap- 
plied more widely though since, in general, ijf{Yi,Y2, c{S)) yf c{iff{Yi,Y2, 

Example 10. Let S = {{a;}, {y, a;}, {y, z}, {a;, z}}. Then c{S) = S and tjff ({y}, {z}, 
S) = {{a;}, {y, z}, {y, x, z}} but c({{a;}, {y, z}, {y, x, z}}) = {{a;}, {y, z}}. I 



In practice, however, the space saving yielded by c{iff{Yi,Y2,S)) over iff 
{Yi,Y2,S) is usually small and not worth the effort of computing c. 

Curiously, the efficiency of meet computations can often be significantly im- 
proved by introducing some redundancy into the representation. Specifically, a 
Boolean function is represented by a pair (M, S) where M = X \ var{S). The 
pair (M, S) does not include any information that is not present in S': it sim- 
ply flags those variables, M, that are ground (or definite). (This is reminiscent 
of the reactive ROBDD representation of Bagnara [2].) This is very useful in 
computing [Si]= □ [S2]= by the method prescribed in Theorem 1. Since meet is 
commutative, [Si]= □ [S2]= can be computed by the sequence S[ = p{Si) U S2, 
= iff {{p{xTr(j))}, {xTr(j)}, S') where tt is a permutation on {1 , . . . , n}. The 
tactic is to choose a permutation with a maximal m G {0 , . . . , n} such that 
{p{x^(i)) G M\/x^(i) e M) ... (p(x^(„)) G MVx^(rn) G M) where M = M1UM2, 
Ml = X\ var{Si) and M2 = X\ var{S2). We call this technique meet reschedul- 
ing, and illustrate its usefulness in the following example. 



Example 11. Consider [Si]=n[S2]= where AT = {a;i, a;2, 3:3}, Si = {{a;i, 3:2}} and 
S2 = {{a^i}, {x2, a^a}}- Thus «x(Si) = (3:1 ^ X2) A x^ and «x(S2) = (3:2 ^ 3:3). 
Also Ml = {3:3}, M2 = 0 and thus M = {xa}. If p = {xi 1-^ xf, X2 xf , X3 i-^- 
X3'} then scheduling naively and using tt = {1 1— > 3, 2 i-^- 1, 3 i-^- 2} we obtain, 
respectively 



-S"l = j jx'i, Xo}, {Xi}, {X2, X3}} 
S2 = {{a;i, X, , Xo}, {x2, X3}} 

' ^ |{3^i>2;i,a;2,X2,X3}} 



So = {{Xi,X2},{xi}} 

si = {{a;I,xi,X2}} 

S^ = 0 



Note how the re-ordering tt tends to reduce the size of the intermediate S'. I 



A pair (M, S) representation is preferred to recomputing M prior to each meet 
because formulae typically occur as the operands of many meet operations. Thus 
M serves as a memo, avoiding unnecessary recomputation. 




68 



Andy King, Jan-Georg Smaus, and Pat Hill 



6 Widening 

Apart from reducing the size of abstractions, it is also worthwhile to avoid gener- 
ating large abstractions that can arise from the quadratic growth of iff{Yi,Y 2 , S) 
and z/(Yi, >2 7 •S') stemming from S'iG)iS' 2 . However, if [S'! = n, |S'i| = ni, 1521 = n -2 
then \iff{Yi,Y 2 , S)\ < n + mn 2 — {ni + U 2 )- Thus it is possible to detect that 
\ijf{Yi, Y 2 , 5)1 will definitely be small, say less than a threshold k, without com- 
puting ijf{Yi,Y 2 ,S) itself. This leads to the following (widened) versions of iff 
and if that trade precision for efficiency. 

Definition 14. 

iff,{Y,,Y 2 ,S)= ifJY,,Y 2 ,S) = 

Uff(Yi,Y 2 ,S) ifn+mn 2 -{ni+n 2 ) < k fz/(Yi,T2,5) ifn+nin 2 ~ni < k 
15 otherwise 15 otherwise 

where S\ = rel{Yi, 5), 82 = rel{Y 2 , S), |5| = n, |5i| = ni and |52| = n 2 . I 

This (space) widening ensures that at each stage of the analysis the size of an ab- 
straction is kept smaller than k. In fact, since the size of the abstraction depends 
on the number of variables, k is defined as a multiple of the number of the vari- 
ables in a clause. This is enough to ensure that, in our interpreter, our space usage 
grows linearly with the size of the program. A widened meet can be obtained 
by replacing each iff{{p{xj)}, {xj}, 5') of Theorem 1 by iffk{{p{xj)}, {xj}, 5'). 
(Interestingly, a widening for ROBDD’s is described by Fecht [20] that combats 
the space problems that arise in the analysis of high arity predicates.) 

Folklore [8] says that call and answer patterns rarely get updated more than 
3-4 times. This is true for many small programs, but in chat_80.pl and aqua_c.pl 
we have observed patterns being updated 10-12 times. To bound the number of 
iterations that can occur, we widen abstractions if they are updated more than, 
say, 8 times. This (time) widening is defined by: A(5) = 5' U {{a:} | x e var{8) \ 
t:ar(5')} where 5' = {G' G 5 | VG G 5.(Gn G' yf 0) ^ (G' C G)}. Observe that 
[5]= G [A(5)]= and that ax(A(5)) = (AT) A (A{a; ^ y | G G A{S)Ax, y G G}) 
where Y = X \ var(A{S)). Formulae of this form occur in the WPos domain 
of Codish et al [11] and thus have a maximal chain length that is linear in |A|. 
This ensures that the number of iterates will be linear in the sum of the arities of 
program predicates, and thus provides a time guarantee for a cautious compiler 
vendor. 

7 Experimental work 

To investigate whether a quadratic meet, meet rescheduling and widening are 
enough to obtain an efficient and scalable dependency analysis, we have imple- 
mented an analyser in Prolog as a simple meta-interpreter that uses induced 
magic-sets [9] and eager evaluation [27] to perform goal-dependent bottom-up 
evaluation. Induced magic is a refinement of the magic set transformation, avoid- 
ing much of the re-computation that arises because of the repetition of literals 
in the bodies of magic’ed clauses [9] . It also avoids the overhead of applying the 




Quotienting Share for Dependency Analysis 



69 



magic set transformation. Eager evaluation [27] is a fixpoint iteration strategy 
which proceeds as follows: whenever an atom is updated with a new (less pre- 
cise) abstraction, a recursive procedure is invoked to ensure that every clause 
that has that atom in its body is re-evaluated. Eager evaluation can involve 
more re-computation than semi-naive iteration but it has the advantages that 
(1) a (Z\-)set of recently updated atoms does not need to be represented; (2) 
eager evaluation performs a depth-first traversal of the call-graph so that infor- 
mation about strongly connected components (SCCs) of the call-graph is not as 
important as in semi-naive iteration. Thus we also avoid computing SCCs. 
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0.08 


2.37 


0.74 


0.72 


00 


88 


89 


89 


- 


sim.pl 


0.76 


0.18 


3.62 


2.51 


2.46 


00 


81 


100 


100 


- 


ili.pl 


0.61 


0.13 


00 


00 


1.69 19.16 


4 


- 


4 


4 


lnprolog.pl 


“OiTT 


TTT6’ 


0.37 


0.53 


OT^ 


CJ723 


54 


143 


143 


143 


rubik.pl 


0.79 


0.2 


2.0 


1.96 


1.93 


00 


153 


160 


160 


- 


strips.pl 


0.79 


0.04 


0.17 


0.16 


0.16 


0.06 


144 


144 


144 


144 


peval.pl 


0.68 


0.08 


1.92 


1.49 


1.06 


9.34 


27 


27 


27 


27 


sim_v5-2.pl 


0.86 


0.11 


0.48 


0.55 


0.54 


0.49 


100 


101 


101 


101 


chat_parser.pl 


1.09 


0.62 


4.27 


3.88 


3.52 


00 


444 


505 


505 


- 


aircraft.pl 


2.01 


8.56 


1.34 


1.33 


1.3 


0.35 


228 


687 


687 


687 


essln.pl 


1.49 


0.25 


2.76 


1.43 


1.39 


17.44 


103 


155 


155 


155 


chat_80.pl 


4.63 


1.23 


12.89 


9.99 


9.82 


00 


457 


839 


839 


- 


aqua_c.pl 


12.17 


7.07 


00 


oo 


69.56 


oo 


1087 


- 


1227 


- 



The table summarises our experimental results for applying Def to some of 
the largest Prolog and CLP (7^) benchmark programs that we could find on the 
WWW. The programs are ordered by size, where size is measured in terms of the 
number of (distinct abstract) clauses. To assess the precision of the Def analysis, 
we have implemented a standard Pos analysis following the technique of Codish 
and Demoen [10]. Ideally our Def analysis should match its precision. We have 
also modified this analysis to obtain a Con analysis [23] . Ideally our Def analysis 
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should significantly improve on its precision, since otherwise neither Def or Pos 
are worthwhile! For completeness, we have included the timings for Pos and Con, 
but we are primarily concerned with precision. Our Pos analysis is not state-of- 
the-art. The abs column give the time for parsing the files and abstracting them, 
that is, replacing built-ins, like arg(X, T, S), with formulae, like X A (S ^ T). 
This overhead is the same for all the analyses. The fixpoint columns gives the time 
to compute the fixpoint. Defn is a naive implementation of our analysis (that 
took two person weeks to construct) which applies compression but not meet 
rescheduling and widening; Defr additionally applies meet rescheduling; and 
Defyj applies compression, meet rescheduling and widening. The Defr and Defw 
analysers were developed together and took an additional 4 days to construct. 
The code for Defn, Defr and Defn, meta-interpreters (including all the set 
manipulation utilities) is less than 700 clauses. We widen for time at iteration 
8 and widen for space when the number of variable sets is more than 16 times 
the number of variables in a clause. Times are in seconds and oo indicates that 
the fixpoint calculation timed out after two minutes. The timings were carried 
out on an Sun-20 SuperSparc with 64 MByte to match the architecture of Fecht 
[20]. The analysers were coded in SICStus 3#5 and compiled to naive code. The 
precision columns give the total number of ground arguments in the call and 
answer patterns: this is an absolute measure which reflects the usefulness of the 
analysis for code optimisation. The precision figures for Defn and Defr are the 
same and given in column Def. 

The experimental results indicate that Defw has good scaling behaviour. 
This is the crucial point. Put simply, there are no programs for which Pos ter- 
minates within two minutes and Defn, does not (although Pos is sometimes 
faster). Usually meet rescheduling gives a speedup and sometimes this speedup 
is very dramatic. 10% of the programs, however, run slower with meet reschedul- 
ing. This typically occurs in programs with very few ground arguments where 
the effort of rescheduling in not repaid by a reduction in the size of sharing 
abstractions. Widening seems to be crucial for scalability as is illustrated by 
reducer, ili and aqua_c. Widening, in fact, is rarely applied. It is crucial for effi- 
ciency though because, just one large sharing abstraction can have a disastrous 
impact on performance. (This also suggests that widening is necessary in the 
pair-sharing quotient of Share [3].) 

Since our machine matches that of Fecht [20] we can also compare the speed 
of our Def analyser to the BDD-based Pos GENA analyser [20] . This the one of 
the fastest (perhaps the fastest) Pos analysis that is described in the literature. 
With the sophisticated CallWDFS [20] framework, ann.pl takes 0.18 s, nand.pl 
takes 0.31 s, chat_80.pl takes 4.29 s, and aqua_c.pl takes 28.54 s. Since Fecht [20] 
does not give processor details for his Sparc-20, we have run our experiments 
on the slowest 50MHz model that was manufactured. His machine could well 
be almost twice as fast. Even though our framework is not semi-naive, we are 
(at most) 2-4 times as slow as GENA. Furthermore, to perform a comparison 
against China instantiated with Pos [4], Bagnara has run Defn and China on a 
Pentium 200MHz PC with 64 MByte of memory. On trs.pl and chat_80.pl Defn 
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take 3.17 s and 12.59 s respectively running interpreted SICStus 3#6 bytecode. 
China takes 2.94 s and 6.24 s respectively. It seems reasonable to assume that 
with Defw on the same PC, trs.pl and chat_80.pl would take 3.17 x « 2.55 s 
and 12.59 x « 9.59 s. This performance gap for chat_80.pl would be closed 
if naive code assembly was available for the PC. To summarise, the experimental 
results are very encouraging and despite the simplicity of the interpreter, our 
Defw analysis appears to be fast, precise and scalable and, of course, can be 
implemented easily in Prolog. 



8 Related work 

Cortesi et al [15] first pointed out that Share expresses the groundness dependen- 
cies of Def . Quotienting was introduced by Cortesi et al [16] as a systematic way 
of obtaining the reference domain of [15]. Like Bagnara et al [3], we do not fully 
adhere to the quotienting terminology and methodology of Cortesi et al [15] but 
rather follow the standard convention [17] of inducing an equivalence relation 
(=) from an abstraction map (ax)- Also, Lemma 6.2 of [16] can be interpreted 
as a way of computing the meet in Def with the classic abstract unification of 
Jacobs and Langen [22]. We take this further and show how the meet can be 
computed without exponential time closure operations. 

Bagnara et al [3] point out that Share includes redundant sharing information 
with respect to pair-sharing. This work is related to ours in that our domain may 
be viewed as a further quotient of the pair-sharing domain. However, widening 
has not been explored for the pair-sharing domain although, we have shown that 
even for our simpler domain, that widening is crucial for scalability. 

Armstrong et al [1] investigate various normal forms of Boolean functions 
and the relative precision of Pos and Def. C-based implementations of each 
representation are described. For the representations of Pos, it is concluded 
that ROBDD’s give the fastest analysis. A specialised representation for Def, 
based on Dual Blake Canonical Form (DBCF), is found to be the fastest overall. 
For medium-sized programs it is several times faster than ROBDD’s, and it is 
concluded that this is the representation likely to scale best for real programs. 
The precision achieved using Pos was found to be significantly higher than Def, 
although it is remarked that a top-down analyser would improve the precision 
of Def since it is not condensing. Our findings support this remark. 

Bagnara and Schachte [4] develop the idea [2] that a hybrid implementation of 
ROBDD’s that keeps definite information separate from dependency information 
is more efficient than keeping the two together. This hybrid representation can 
significantly decrease the size of ROBDD’s and thus is a useful implementation 
tactic. A comparison with our Def analysis has already been given. Fecht [20] 
compares his Pos analyser to that of Van Hentenryck et al [26] and concludes 
that his analyser is an order of magnitude faster. For reasons of space, the reader 
is referred to [20, pp. 305-307] for more details. Performance figures for another 
hybrid representation are given in [24]. We just observe that [4] and [20] are very 
good systems to measure against. 
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Garcia de la Banda et al [21] represent Def functions in terms of a domain 
p{X) X p(X X p{p{X))), so that the Herbrand constraint x = f{y,z), for ex- 
ample, is represented by (0, {(a;, {{y, 2 }}), {y, {{a;}}), {z, {{a;}})}) which encodes 
X ^ {y A z). Abstract conjunction is expressed in terms of six rewrite rules 
that put conjunctions of formulae into a normal form. Although not stated, 
the normal form is essentially the (orthogonal) reduced monotonic body form 
[1] in which a definite function is represented as / = Axex(x <— Afc) where 
Mx € Manx and x ^ Mx- Orthogonality ensures that the meet is safe. Our work 
shows how this symbolic manipulation of definite function can be replaced with 
a simpler domain and simpler join and meet operations. 

Corsini et al [12] describe how variants of Pos can be implemented using 
Toupie, a constraint language based on the ^-calculus. This BDD-based analysis 
appears to be at least five times as fast as [26] for success pattern analysis. 
Thus, if the analyser was extended with magic sets, say, it might lead to a very 
respectable goal-dependent analysis. 

Codish and Demoen [10] describe a truth-table based implementation tech- 
nique for Pos that would encode {x\ ^ {X 2 AX 3 )) as three tuples {true, true, true), 
{f alse, false), {false, false, _). A widening for this Pos analysis, WPos, is 
proposed by Codish et al [11] that amounts to a sub-domain of Def that cannot 
propagate dependencies of the form y ^ {y A z), but only simple dependencies 
like {x ^ y). The main finding of Codish et al [11] is that WPos looses only 
a small amount of precision for goal-dependent analysis of Prolog and CLP (7^) 
programs. 

9 Conclusions 

We have developed the link between Def and Share to show how the meet 
of Def can be modelled with an efficient (quadratic) operation on Share. We 
have shown how to represent formulae succinctly with equivalence classes of 
sharing abstractions, and how formulae can be widened so as to avoid bad space 
behaviour. Putting these ideas together we have achieved a practical analysis 
that is fast, precise, robust and can be implemented easily in Prolog. 
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Abstract. We define an extension of the 7r-calculus with a static type 
system which supports high-level specihcations of extended patterns of 
communication, such as client-server protocols. Subtyping allows proto- 
col specifications to be extended in order to describe richer behaviour; an 
implemented server can then be replaced by a refined implementation, 
without invalidating type-correctness of the overall system. We use the 
POPS protocol as a concrete example of this technique. 



1 Introduction 

Following its early success as a framework for the investigation of the foun- 
dations of various concurrent programming styles, the 7r-calculus [12] has also 
become established as a vehicle for the exploration of type systems for con- 
current programming languages [2,7,9,11,15,21]. Inter-process communication in 
the TT-calculus is based on point-to-point transmission of messages along named 
channels, and many proposed type systems have started from the assignment of 
types to channels, so that the type of a channel determines what kind of message 
it can carry. Because messages can themselves be channel names, this straight- 
forward idea leads to detailed specifications of the intended uses of channels 
within a system. This line of research has not been purely theoretical: the Piet 
programming language [16] is directly based on the 7r-calculus and has a rich 
type system which incorporates subtyping on channel types and higher-order 
polymorphism . 

Honda et al. [5,20] have proposed 7r-calculus-like languages in which certain 
channels can be given session types. Such a channel, which we will call a session 
channel, is not restricted to carrying a single type of message for the whole 
of its lifetime; instead, its type specifies a sequence of message types. Some 
of the messages might indicate choices between a range of possibilities, and 
different choices could lead to different specifications of the types of subsequent 
messages; session types therefore have a branching structure. One application 
of session types is the specification of complex protocols, for example in client- 
server systems. The main contribution of the present paper is to add subtyping 
to a system of session types, and show that this strengthens the application to 
the specification of client-server protocols. The differences in syntax between 
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our language and the 7r-calculus are minimal; all the special treatment of session 
channels is handled by the typing rules. We anticipate that this will make it 
easier to achieve our next goal of incorporating our type system into a modified 
version of the Piet language and compiler. 

Consider a server for mathematical operations, which initially offers a choice 
between addition and negation. A client must choose an operation and then send 
the appropriate number of arguments to the server, which responds by sending 
back the result. All communications take place on a single session channel called 
X, whose session type is 

S = &(plus : ?[int] . ?[int] . ![int] . end, negate: ?[int] . ![int] . end). 

More precisely, this is the type of the server side of the channel. The &(. . .) 
constructor specifies that a choice is offered between, in this case, two options, 
labelled plus and negate. Each label leads to a type which describes the sub- 
sequent communication on x] note that the two branches have different types, 
in which ?[int] indicates receiving an integer, ![int] indicates sending an integer, 
. is the sequencing constructor, and end indicates the end of the interaction. 
The client side of the channel x has a dual or complementary type, written S. 
Explicitly, 

S = 0(plus: ![int] . ![int] . ?[int] . end, negate: ![int] . ?[int] . end). 

The 0(. . .) constructor specifies that the client makes a choice between plus and 
negate. Again, each label is followed by a type which describes the subsequent 
interaction; the pattern of sending and receiving is the opposite of the pattern 
which appears on the server side. 

An implementation of a maths server must use x in accordance with the 
type S, and an implementation of a client must use x in accordance with the 
type S. These requirements can be enforced by static typechecking, and it is 
then guaranteed that no communication errors will occur at runtime. When the 
client chooses a label, it is guaranteed to be one of the labels offered by the 
server; when the client sends a subsequent message, it is guaranteed to be of the 
type expected by the server; and similarly when the server sends a message. 

The typing rules to be introduced in Section 3 will allow the derivation of 

a; : 5^ h server 



where 

server = a; i> { plus: a; ? [a : int] . a; ? [6 : int] . a; ! [a 0 6] . 0, 
negate : a; ? [a : int] . a; ! [—a] . 0}. 

The operation i> allows a message on x to choose between the listed alternatives. 
The labels are the same as those in S, and the pattern of inputs (a;? [a : int] ... .) 

and outputs (a: ! [a 0 &] ) matches that in S. The usage annotation of 1 on S' 

indicates that only one side of x is being used by server. 

One possible definition of a client is 



client = X < negate . a: ! [2] . a: ? [a : int] . 0 
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and we can derive the typing judgement 

X : 5^ h client. 

Note the use of < to select from the available options, and that the subsequent 
pattern of inputs and outputs matches the specification in S. This client does 
not do anything with the value received from the server; more realistically, 0 
would be replaced by some continuation process which used a. 

The client and the server can be put in parallel using the typing rule T-Par 
for parallel composition: 

X : 5^ h server x : 5^ h client 
X : 5^ h server | client 

where the usage annotation 2 on S' indicates that both sides of x are being used. 

The usage annotations are necessary in order to ensure that each side of x is 
only used by one process. The system server | client | client is erroneous because 
both clients are trying to use x to communicate with the same server. If this 
system is executed, one client will use x to choose either plus or negate. After 
that, the server expects to receive an integer on x, but the other client will again 
use X to choose between plus and negate. This is a runtime type error of the kind 
that the type system is designed to avoid. We will see later that 

X : 5^ h client x : 5^ h client 
X : 5^ h client | client 

is not a valid application of the typing rule T-Par. 

To avoid runtime type errors we must ensure that session channels are used 
linearly [3]. Our typing rules use techniques similar to those of Kobayashi et 
al. [9] to enforce linearity. The type system also allows non-session types to be 
specified, and there are no restrictions on how many processes may use them. 
For example, y : ^[int] is a channel which can be used freely to send or receive 
integers. 

How, then, can we implement a server which can be used by more than one 
client? The solution is for each client to create a session channel which it will use 
for its own interaction with the server. The server consists of a replicated thread 
process; each thread receives a session channel along a channel called port and 
uses it to interact with a client. 

thread = port ? [x : 5^] . server 
newserver = [thread 

clientlbody = y < negate : y ! [2] . y ? [a : int] . 0 
clientl = {vy : S^jport ! [y] . clientlbody 
client2body = z < plus: ^ ! [1] . ^ ! [2] . ^ ? [6 : int] . 0 
client2 = {vz : S^jport ! [z] . client2body 

Now 

port : ^[5^] h newserver | clientl | client2 
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is a valid typing judgement. Because port does not have a session type, it can be 
used by all three processes. When this system is executed, clientl sends the local 
channel y to one copy of thread; the standard 7r-calculus scope extrusion allows 
the scope of y to expand to include that copy of thread, and then clientlbody 
and server have a private interaction on y. Similarly client2 and another copy 
of thread use the private channel ^ for their communication. Notice that y is a 
session channel with usage 2 in clientl, and indeed both sides of y are used: the 
side whose type is S is used by being sent away on port, and the side whose type 
is S is used for communication by clientlbody. 

The final ingredient of our type system is subtyping. On non-session types, 
subtyping is defined exactly as in Pierce and Sangiorgi’s type system for the 
TT-calculus [15]: if Vz G {l,...,n}.Ti ^ Ui then ^[Ti,...,T„] ^ ?[/7i, . . . , /7„] 
and ^[/7i, . . . , ^ ![Ti, . . . , T„j. Channels whose type permits both input and 

output P) can be used in positions where just input or just output is required. 
We also have ?[Ti,...,T„] ^ ?[f7i, . . . , C/„] and ![C/i, . . . , C/„] ^ ![Ti, . . . , T„]; 
recall that input behaves covariantly and output behaves contravariantly. 

Subtyping on sequences Ti . • • • . T„ is defined pointwise, again with ? acting 
covariantly and ! acting contravariantly. More interesting is the definition of 
subtyping for branch and choice types. If a process needs a channel of type 
&(/i :Ti, . . .,ln' Tn) which allows it to offer a choice from {^i, . . . , In}, then it 
can safely use a channel of type k.{l\-.Ti, . . . where m ^ n, instead. 

The channel type prevents the process from ever receiving labels Im+i ,■■■, In 
but every label that can be received will be understood. Furthermore, a channel 
of type S\, . . . ,lm- Sm) can be used if each Si < Ti, as this means that 
after the choice has been made the continuation process uses a channel of type 
Si instead of Ti and this is safe. In Section 5 we will see how subtyping can be 
used to decribe modifications to the specification of a server. 

The remainder of the paper is organised as follows. Section 2 defines the 
syntax of processes and types, and some basic operations on type environments. 
The typing rules are presented in Section 3. Section 4 defines the operational 
semantics of the language and states the main technical results leading to type 
soundness. Section 5 uses our type system to specify the POPS protocol, and 
discusses the role of subtyping. Finally we discuss related work, and outline our 
future plans, in Section 6. 



2 Syntax and Notation 

Our language is based on a polyadic 7r-calculus with output prefixing [12]. We 
omit the original zr-calculus choice construct P + Q, partly in order to keep the 
language close to the core of Piet [16]. However, we have the constructs intro- 
duced in Section 1 for choosing between a collection of labelled processes, as 
proposed by Honda et al. [5,20]. We also omit the matching construct, which 
allows channel names to be tested for equality, again because it is not present in 
core Piet. The inclusion of output prefixing is different from many recent pre- 
sentations of the TT-calculus, but it is essential because our type system must 
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be able to impose an order on separate outputs on the same channel. It is con- 
venient to add a conditional process expression, written if 6 then P else Q where 
6 is a boolean value, and therefore we also have a ground type of booleans; 
other ground types, such as int as used in the examples in Section 1, could be 
added along with appropriate primitive operations. As is standard, we use the 
replication operator ! instead of recursive process definitions. 

The type system has separate constructors for input-only, output-only and 
dual-capability channels, as suggested by Pierce and Sangiorgi [15]. It also has 
constructors for session types, as proposed by Honda et al. [5,20]. The need for 
linear control of session channels leads to the usage annotations on session types, 
which play a similar role to the polarities of Kobayashi et al. [9]. Subtyping will 
be defined in Section 3. 

In general we use lower case letters for channel names, li, . . ,,ln for labels of 
choices, upper case P, Q, R for processes, and upper case T, U etc. for types. We 
write X for a finite sequence Xi, . . . , of names, and x : T for a finite sequence 
X\ : Ti, . . . , Xn ■ Tn of typed names. 



2.1 Processes 



The syntax of processes is defined by the following grammar. Note that T and 
T stand for types and lists of types, which have not yet been defined. 



P 0 

I P\Q 

I x7[y:f].P 
I x\[y].P 
I {vx:T)P 



I X [> . Pi, . . . , . P 71 } 

\x<il . P 

|!P 

I if X then P else Q 



Most of this syntax is fairly standard. 0 is the inactive process, | is parallel 
composition, {vx : T)P declares a local name x of type T for use in P, and 
IP represents a potentially infinite supply of copies of P. x ? [y : T] . P receives 
the names y, which have types T, along the channel x, and then executes P. 
X ! [y] . P outputs the names y along the channel x and then executes P. There 
should be no confusion between the use of ! for output and its use for replication, 
as the surrounding syntax is quite different in each case, x > {?i : Pi, . . . , : P„} 

offers a choice of subsequent behaviours — one of the Pi can be selected as the 
continuation process by sending the appropriate label k along the channel x, as 
explained in Section 1. x < ^ . P sends the label I along x in order to make a 
selection from an offered choice, and then executes P. The conditional expression 
has already been mentioned. 

We define free and bound names as usual: x is bound in (i/x : T)P, the names 
in y are bound in x? [y : T] . P, and all other occurrences are free. We then define 
a-equi valence as usual, and identify processes which are a-equi valent. We also 
define an operation of substitution of names for names: P{x/y} denotes P with 
the names Xi, . . . , x„ simultaneously substituted for yi, . . . , y„, assuming that 
bound names are renamed if necessary to avoid capture of substituting names. 
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As usual we define a structural congruence relation, written =, which helps to 
define the operational semantics. It is the smallest congruence (on a-equi valence 
classes of processes) closed under the following rules. 



P I 0 = P S-Unit 

p I g = g I p s-coMM 

p I (g I p) = (p I g) I p s- Assoc 

{vx : T)P I g = \vx : T)(P I Q) if x is not free in Q S-Extr 
{vx : T)0 = 0 S-Nil 

{vx : T){vy : U)P = {vy : U){vx : T)P S-Perm 

!P = P|!P S-Rep 

X t> . Pi , . . . , g . P71} = X[>'[gfiJ. Pcr(l) 5 • • -5 l(7{n) ■ Rr(n)} S- OFFER 



In rule S- Offer, ct is a permutation on {1, . . . , n}. 



2.2 Types 

The syntax of types is defined by the following grammar. 

Ground types G ::= bool 

Channel types C ::= 7[Ti, . . . , T„] 

1Ti,...,T„] 

end 

?[Ti,...,T„].5 

![P,...,Tn].5 

0(/i : Si , . . . , P : Sn} 

G\C\A 

A(type variable) 
yX.T {recursive type) 

Annotated session types A 5^ 15^ 

The usage annotation, or just usage, of a session type indicates how a channel 
of that type can be used: if x : 5^ then x can only be used as specified by S, but 
if X : 5^ then both sides of x can be used, including the side described by S. We 
omit usage annotations from end, and often omit usage annotations of 1. 

We define the unwinding of a recursive type: unwind(^A.T) = T{yX.T/X}. 
If T is a type then T is the dual (or complementary) type of T, defined 
inductively as follows. 

k{li:Si,...,ln:Sn) = ©(/i : ^i, . . . , 

®{li:Si,...,ln:Sn) = k{h:S i,... ,ln:K{) 
b^=bool ?[f] = ![f] 

end = end ![T] = ?[T] 

A = A JJCT = yX.T 



Session types S :: 



Types T : : 
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9 — 2 

S and S both describe a session channel of which both ends are being used. 
We adopt the convention that when is written, S is either a branching type 
or begins with an input. We say that a session type is complete if it has usage 
2; it is incomplete if it is end or has usage 1. 



2.3 Environments 

An environment is a set of typed names, written Xi : Ti , . . . , : T„. We use F, A 

etc. to stand for environments. We assume that all the names in an environment 
are distinct. We write x G F to indicate that x is one of the names appearing in 
F, and then write F{x) for the type of x in F. When x ^ F we write F,x : T for 
the environment formed by adding x : T to the set of typed names in F. When 
F and A have disjoint sets of names, we write F, A for their union. Implicitly, 
true : bool and false : bool appear in every environment. 

The partial operation + on types is defined by 

T + T = T if T is a ground type, a channel type, or end 
+ S = 5^ if 5 is a session type 

and is undefined in all other cases. 

The partial operation +, combining a typed name with an environment, is 
defined as follows: 



F + X : T = F, X : F it x ^ F 

{F,x : T) + X : U = F,x : {T + U) if T + 17 is defined 

and is undefined in all other cases. 

We extend + to a partial operation on environments by defining 

F + (xi : Ti, . . . , Xn '■ Tn) = {■ ■ ■ {F + Xi : Ti) + •••+)+ a;„ : 

We say that an environment is unlimited if it contains no session types except 
for end. 

3 The Type System 

3.1 Subtyping 

The principles behind the definition of subtyping have been described in Sec- 
tion f. Figure f defines the subtype relation formally by means of a collection of 
inference rules for judgements of the form E h T ^ U, where E ranges over finite 
sets of instances of When 0 h T ^ is derivable we simply write T ^ U. 
The inference rules can be interpreted as an algorithm for checking whether 
T ^ U for given T and U, as follows. Beginning with the goal 0 h T ^ f7, apply 
the rules upwards to generate subgoals; pattern matching on the structure of T 
and U determines which rule to use, except that the rule AS-AssUMP should 
always be used, causing the current subgoal to succeed, if it is applicable. If 
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both AS-Rec-L and AS-Rec-R are applicable then they should be used in ei- 
ther order. If a subgoal is generated which does not match any of the rules, the 
algorithm returns false. 

Pierce and Sangiorgi [15] give two definitions of their subtype relation: one by 
means of inference rules (as in Figure 1) and one as a form of type simulation, de- 
fined coinductively. The subtyping algorithm derived from the inference rules can 
then be proved sound and complete with respect to the coinductive definition, 
while the coinductive definition permits straightforward proofs of transitivity 
and reflexivity of the subtype relation. In the same way, we can characterize our 
subtype relation coinductively, prove soundness and completeness of the subtyp- 
ing algorithm, and prove transitivity and reflexivity; due to space constraints, 
we have omitted the details from the present paper. 

The subtype relation is defined on non-annotated types, but annotations 
preserve subtyping: if z G {1,2} then S'® ^ T® if and only if S ^ T. 

If T and U have the same length, n, and Vz G {1, . . . ,n}.Ti ^ Ut, we write 

f^U. 



3.2 Typing rules 

The typing rules are defined in Figure 2. Note that a judgement of the form 
r \- X : T ^ U means x : T G F and T ^ U. Subtyping appears in the 
hypotheses of rules T-Out, T-OutSeq, T-1n and T-InSeq, where it must be 
possible to promote the type of the channel to the desired input or output type. 
It appears less explicitly in rules T-Offer and T-Choose, where the type of x 
must include enough labels for the choice being constructed. 

Each typing rule is only applicable when any instances of -I- which it contains 
are actually defined. This ensures that the environment correctly records the use 
being made of session channels. Consider again the two applications of T-Par 
from Section 1. 

a; : 5'^ h server a; : 5^ h client a; : 5^ h client a; : 5^ h client 

a; : 5^ h server | client a; : h client | client 

The first is correct because = 5^. The second is incorrect because 

is not defined; this prevents the session channel x from being used simultaneously 

by both copies of client. 

Notice also that in rules T-Out and T-OutSeq, the names being output 
are added to the environment; this means that if a session channel is output then 
the part of its usage which is given away cannot be used again by the remainder 
of the process. This allows a process to begin a communication on a session 
channel, then delegate the rest of the session to another process by sending it 
the channel; of course, the first process must not use the channel again. Such 
behaviour arises when a recursive process, which uses a session channel (of a 
recursive type), is represented in terms of replication: when the next instance of 
the recursive process is invoked, the session channel must be passed on. 
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T ^ [/ G r 

AS- Assume 

E\-T 

AS-Bool AS-End 

S h bool ^ bool S h end ^ end 

ri-f^ [/ 

AS-In 

r h ?[f] ^ ?[u] E h ^[f] ^ ?[u] 

r h [/ ^ f 

AS-Out 

r h ![f] ^ ![H] ri-^[f] ^ ![H] 

ri-f^C/andri-C/^f 

AS-InOut 

r h -\f] ^ -[U] 

ri-E^iE ri-f^c/ 

AS-InSeq 

r h ?[f] . V ^ ?[c7] . w 

ri-E^lE EhU 

AS-OutSeq 

Eh ![T] \[U].W 

m n yi <h {1, . . . ,m}.E h Si Ti 
AS-Branch 

Vi G {1, . . . , ml.E h Ti ^ Si 

AS-Choice 

Eh ^ ©(/i:Ti,...,/^:T^) 

r, fiX.S ^ T h unwind(^A.S) ^ T 

AS-Rec-L 

E h ^iX.S ^ T 

r, T ^ fiX.S h T ^ unwind(^A.S) 

AS-Rec-R 

EhT ^.X.S 



Fig. 1. Inference rules for subtyping 
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r unlimited F \- P A\- Q 

T-Nil T-Par 

rho r + zii-p|Q 

r, X : T \- P if T is a session type it must be complete 

T-New 

Ph {vx-. T)P 

rhP Pha;^![f] P,y:f|-P P h ® ^ ?[f] 

T-Out T-In 

P + y-.fhx\\y].P Ph xl[y-.f].P 

P,x ■. S \- P S is incomplete U 

T-OutSeq 

{P,x-.\\T].S)+y-.U^x\\y].P 

P, X S,y U \- P S is incomplete T ^ P 

T-InSeq 

P, a; : ?[T] . S' h a; ? [j/ : P] . P 

P, a; : Si h Pi . . . P, a; : St, h Pn each Si is incomplete m ^ n 

T-Offer. 

P, X : &(p :Si, . . . , X t> {h : Pi, , l„:P„} 

P, X : Si*' h P Si = end or ti = 1 

T-Choose 

P, X : (B{h :Si,...,l„: S„)* \~ x <li . P 

P \- X : bool PhPPhQ P \~ P P unlimited 

T-Cond T-Rep 

P h if a; then P else Q P h!P 



Fig. 2. Typing rules 



R-Comm 

a; ? [y : T] . P I a; ! [5] . Q — > P{z/y} \ Q 

ie n} 

a;l> {P :Pi, . . . , l„:P„} \x<li . Q — > Pi \ Q 

P — >P' P' = P P — >Q Q = Q' 

R-Par R-Cong 

P\Q~^P'\Q P' —>Q' 

P — > P' 

R-New 

{vx : T)P — > (vx : T)P' 



R-True R-False 

if true then P else Q — > P if /aPe then P else Q — > Q 



Fig. 3. The reduction relation 
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Finally, note that a session type T.end effectively specifies a linear non-session 
channel of type T as used in [9] . 

4 Operational Semantics 

As usual for 7r-calculus-like languages, the operational semantics is defined by 
means of a reduction relation [11]. P — > Q means that the process P reduces 
to the process Q by executing a single communication step (or evaluating a 
conditional expression). The reduction relation is the smallest relation closed 
under the rules in Figure 3, most of which are standard. The rules R-COMM and 
R-Select introduce communication steps; R-COMM is standard and R-Select 
introduces communications which select labelled options. Note that R-COMM 
applies to both session and non-session channels, as there is no indication of the 
type of X in either process. 

The usual way of proving type soundness is first to prove a subject reduction 
theorem: if T h P and P — > Q then there is an environment A such that Ah Q. 
Then, one proves that if P h P then the immediately available communications 
in P do not cause type errors. Together these results imply that a well- typed 
process can be executed safely through any sequence of reduction steps. However, 
the presence of subtyping means that examining P is not sufficient to determine 
what constitutes correct use of names in P; different occurrences of a single 
name in P might be constrained to have different types. For example, the typed 
process 



a : ^[![boolj], b : ^[?[boolj], a; : ^[bool] h 

a ! [x] . & ! [x] . 0 I a ? [y : ![boolj] . P \ bl [z : ?[boolj] . Q 

reduces in two steps to, essentially, x : ^[bool] h P{x/y} \ Q{x/z}, and occur- 
rences of a: in P have type ![bool] but those in Q have type ?[boolj. 

To address this difficulty, we adopt Pierce and Sangiorgi’s technique of in- 
troducing tagged processes [15], written E, F, etc. instead of P, Q, etc. The 
syntax of tagged processes is identical to that of ordinary processes except that 
all occurrences of names are typed, for example {x : T) \ [x : U] . E. Structural 
congruence is defined on tagged processes by the same rules, with tags added, as 
for untagged processes. We also introduce tagged typing rules, defining judge- 
ments of the form P h E. The tagged typing rules are essentially the same as 
the untagged typing rules; a typical example is the rule TT-Out. 

PhE Phx^T^![P] v^ir 

^ TT-Out 

r+y:Vh{x:T)l[y:U].E 

Note that the type declared for a name in the environment must be a subtype 
of the type with which that name is tagged in the process. 

The tagged reduction relation is written PhE — > Ah F and is defined by 
the rules in Figure 4, together with tagged versions of the rules R-Par, R-Cong, 
R-New, R-True and R- False. 
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Tf^7[U] F^![C/] Wf^U 

TR-Comm 

r, X : ^[S] I- (® : T) ? [j/ : 17] . P I (* : R) ! [5 : IT] . Q 
^ P, X : ^[S] h P{z/y} I Q 



Sf^7[U].R T^![Q].P Wf^U 

TR-CommSeq 

r,x :7[f] .R^ \- {x : S)7 [y -.0] .P \ {x :V)\[z :W] .Q 
P,x-.R^h P{z/y}\Q 



m I e {h,. . .,ln} 

TR-Select 

P, X : &(/i : Ti , . . . , P : h (x : S) > {/i : Pi , . . . , P : P„} |(x : T) < P Q 

^ P, X : P h Pi I Q 



Fig. 4. The tagged reduction relation (selected rules) 



The function Erase from tagged processes to untagged processes simply re- 
moves the extra type information. The definition is straightforward, for example 

Erase((x : T) \ [y : U] . E) = x I [y] . Erase(P). 



Theorem 1 (Tagged Subject Reduction). If E \- E — > A\- E and E \- E 

is derivable then A\- E is derivable. 

Proof. By induction on the derivation of E \- E — > Z\ h F. The assumption 
that P h F is derivable provides the information about the components of F 
which is needed to build a derivation of Z\ h F. 

Observe that the Tagged Subject Reduction Theorem guarantees that the 
tagged reduction relation is well-defined as a relation on derivable tgged typing 
judgements. 

Lemma 1. If F P is a derivable untagged typing judgement, then there is a 
tagged process E such that P — Erase (F) and E 'n E is a derivable tagged typing 
judgement. 

Proof. We can define a function Tagj-(F), by induction on the structure of P, 
which essentially tags every name in P with its exact type as declared in F or by 
a binding n or input. The presence of session types causes a slight complication: 
if X : 5^ G F and x is used in both P and Q, then Tagj-(F | Q) must ensure 
that X is tagged with in P and with S in Q, or vice versa. Essentially the 
same problem is encountered in linear type inference [10], and we use the same 
solution: Tagj.(F) returns a pair {P' , E') where P' is a tagged process and E' 
differs from F only by the possible removal of some usages of session types. Then 
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Tag^(P I Q) = {P' I Q', r") where Tagp(P) = (P'.P') and Tag^, {Q) = {Q\ P"). 
When P h P and Tagj-(P) = (P', P') we have that P' is unlimited (so all session 
types have been removed), okr(P0 and P = Erase(P'). 

Theorem 2. If P \~ P is derivable and P h P is derivable and P — Erase(P) 
and P — >* Q then there exists A \- F such that P h P — >* Z\ h P and Q = 
Erase (P). 

Proof. By breaking the sequence of reductions into individual steps, and showing 
that the result holds for each step; the latter fact can be proved by induction on 
the derivation of the reduction step. 

The Tagged Subject Reduction Theorem, Lemma 1 and Theorem 2 imply 
that any sequence of reductions from a well-typed untagged process can be mir- 
rored by a sequence of reductions from a well-typed tagged process. The final 
theorem establishes that well-typed tagged processes do not contain any imme- 
diate possibilities for incorrect communication. It follows easily from the taggd 
typing rules; most of the work in proving type soundness is concentrated into the 
proof of the Tagged Subject Reduction Theorem. Each case of the conclusion 
shows that whenever a tagged process appears to contain a potential reduction, 
the preconditions for the relevant tagged reduction rule are satisfied and the 
reduction can safely be carried out. 

Theorem 3. If P is a tagged process, P h Erase(P) and okr(P), then 

1. ifP= {nx : l)((a : T) I [y : U] . Pi \ {a : V) I [z : W] . P 2 \ Q) and T is not 

a session type then the declaration a : S occurs in either P or x : X, with 

5 < T ^ ?[P] andS^V ^ \[W] and W^U. 

2. ifP= {vx : X){{a ■. T .T')l [y ■. U] . Pi\{a . V) !J5 : IT] . P 2 | Q) then the 
declaration a : 5^ occurs in either P or x : X , with S ^T.T' and S ^V.V 
and T^\[il] and V ^ ?[IT] and U ^ W. 

3. if P = {vx : X){{a : T) {h: Pi, . . . ,1^-. P^} \ {a : V)<1 . Pq | Q) then 

the declaration a : S occurs in either P or x : X, with S ^ T and T = 

Sz{li:Ti, . . . ,ln'.Tn) and n ^ m and S and V = (B{h'.Vi, . . . ,lr'.Vr) 
and r ^ n and I G {h, . . . ,lr}. 

4- if P = {vx : X)( if athen Pi elseP 2 | Q) then the declaration a : bool occurs 
in either P or x : X. 

If we take a well-typed untagged process and convert it into a tagged process, no 
reduction sequence can lead to a type error. Because every reduction sequence 
from a well-typed untagged process can be matched by a reduction sequence 
from a well-typed tagged process, we conclude that no type errors can result 
from executing a well-typed untagged process. 

5 The POPS Protocol 

As a more substantial example, we will now use our type system to specify the 
POPS protocol [13]. This protocol is typically used by electronic mail software 
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A = quit:©(ok: ![str] . end), 

user:?[str] . ©( error :![str] . X, 

ok: ![str] . &{ quit:©(ok: ![str] . end), 

pass:?[str] . ©( error :![str] . X, 
ok:![str].T)))) 



T = jj,X.&i{ stat:©(ok: ![int, int] . X), 

retr:?[int] . ©( ok:![str] . ![str] . X, 
error: ![str] . X), 
quit:©(ok: ![str] . end)) 



Fig. 5. Types for the POPS protocol 



to download new messages from a remote mailbox, so that they can be read and 
processed locally; it does not deal with sending messages or routing messages 
through a network. A POPS server requires a client to authenticate itself by 
means of the user and pass commands. A client may then use commands such 
as stat to obtain the status of the mailbox, retr to retrieve a particular message, 
and quit to terminate the session. Some of these commands require additional 
information to be sent, for example the number of a message. We have omitted 
several POPS commands from our description, but it is straightforward to fill in 
the missing ones. 

To specify the behaviour of a channel which can be used for a POPS ses- 
sion, we use the type definitions in Figure 5: A describes interactions with the 
authentication state, and T describes interactions with the transaction state. 
These definitions are for the server side of the channel, and we assume that 
there is a ground type str of strings. These definitions illustrate the complex 
structure possible for session types, and show the use of recursive types to de- 
scribe repetitive session behaviour. The server both offers and makes choices, in 
contrast to the example in Section 1. After receiving a command (a label) from 
the client, the server can respond with either ok or error (except for the quit 
command, which always succeeds and does not allow an error response). The 
client implements an interaction of type A, and therefore must offer a choice be- 
tween ok and error when waiting for a response to a command. In the published 
description of the protocol, ok and error responses are simply strings prefixed 
with -}-OK or -ERR. This does not allow us to replace the corresponding 0 by 
![str] in the above definitions because the different continuation types after ok 
and error are essential for an accurate description of the protocol’s behaviour. 
We specify a string message as well, because a POPS server is allowed to provide 
extra information such as an error code. 

As in Section 1 we could implement a process POPSbody such that x : A h 
POPSbody. Defining P0P3= port?[x : A^] . POPSbody gives port : ^[A^] h POPS, 
which can be published as the specification of the server and its protocol. 

The POPS protocol permits an alternative authentication scheme, accessed 
by the a pop command, in which the client sends a mailbox name and an au- 
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thenticating string simultaneously. This option does not have to be provided, 
but a server which does implement it requires a channel of type B, where B is 
obtained from A by adding a third option to the first choice: 

apop: ?[str, str] . ©(error : ![str] . X, ok: ![str] . T) 

A server which implements the apop command can be typed as follows: port : 
h newPOPS. Now suppose that client is a POPS client which does not know 
about the apop command. As before, we have client = {vx : A^)port![a;].clientbody 
where x : h clientbody. This gives port : h client. The following deriva- 
tion shows that client can also be typed in the environment port : and can 

therefore be put in parallel with newPOPS. The key fact is that A ^ B, because 
the top-level options provided by A are a subset of those provided by B. 

X : A^ V- clientbody ^ ![A] 

^ T-Out 

port : ^\B ], X : A \- port ! [xl . clientbody 

! T-New 

port : b {vx : A^)port ! [x] . clientbody 

Space does not permit us to present the definition of POPSbody, but we claim 
that it is simpler and more readable than an equivalent definition in conventional 
TT-calculus or Piet. The key factor is that the session type of x allows it to be 
used for all the messages exchanged in a complete POPS session. Without session 
types, the client has to create a fresh channel every time it needs to send a 
message of a different type to the server; these channels also have to be sent 
to the server before use, which adds an overhead to every communication and 
therefore also to the channel types. Also in this case, the subtype relation on 
non-session types does not describe the relationship between interactions with 
POPS and with newPOPS. 

6 Conclusions and Future Work 

We have defined a language whose type system incorporates session types, as 
suggested by Honda et al. [5,20], and subtyping, based on Pierce and Sangiorgi’s 
work [15] and extended to session types. Session channels must be controlled 
linearly in order to guarantee that messages go to the correct destinations, and 
we have adapted the work of Kobayashi et al. [9] for this purpose. Our language 
differs minimally from the 7r-calculus, the only additions being primitives for 
offering and making labelled choices. Unlike Honda et al. we do not introduce 
special syntax for establishing and manipulating session channels; everything is 
taken care of by the typing rules. We have advocated using a session type as part 
of the published specification of a server’s protocol, so that static type-checking 
can be used to verify that client implementations behave correctly. Using the 
POPS protocol as an example, we have shown that subtyping increases the utility 
of this idea: if upgrading a server causes its protocol to have a session type which 
is a supertype of its original session type, then existing client implementations 
are still type-correct with respect to the new server. 
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Session types have some similarities to the types for active objects studied by 
Nierstrasz [14] and Puntigam [18,19]. Both incorporate the idea of a type which 
specifies a sequence of interactions. The main difference seems to be that in the 
case of active objects, types can specify interdependencies between interactions 
on different channels. However, the underlying framework (concurrent objects 
with method invocation, instead of channel-based communication between pro- 
cesses) is rather different, and we have not yet made a detailed comparison of 
the two systems. 

The present paper is the first report of our work on a longer term project 
to investigate advanced type systems in the context of the Piet [16] program- 
ming language. Our next goal is to extend the Piet compiler to support the type 
system presented here. Because Piet is based on the asynchronous 7r-calculus 
[4,6,1] the output prefixing of our language will have to be encoded by explicitly 
attaching a continuation channel to each message. Initially we will work with a 
non-polymorphic version of Piet; later, after more theoretical study of the inter- 
play between session types and polymorphism, we will integrate session types 
with the full Piet type system including polymorphism. The Piet compiler uses 
a powerful partial type inference technique [17], and it will be interesting to see 
how it can be extended to handle session types. Because of the value of explicit 
session types as specifications, we might not want to allow the programmer to 
omit them completely; however, automatic inference of, for example, some usage 
annotations will probably be very useful. 

The implementation will allow us to gain more experience of programming 
with sessions, which in turn should suggest other typing features which can 
usefully be added to the system — for example, it would be interesting to consider 
Kobayashi’s type system [7,8] for partial deadlock- freedom. 
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Abstract. A race condition is a situation where two threads manipu- 
late a data structure simultaneously, without synchronization. Race con- 
ditions are common errors in multithreaded programming. They often 
lead to unintended nondeterminism and wrong results. Moreover, they 
are notoriously hard to diagnose, and attempts to eliminate them can 
introduce deadlocks. In practice, race conditions and deadlocks are of- 
ten avoided through prudent programming discipline: protecting each 
shared data structure with a lock and imposing a partial order on lock 
acquisitions. In this paper we show that this discipline can be captured 
(if not completely, to a significant extent) through a set of static rules. 
We present these rules as a type system for a concurrent, imperative 
language. Although weaker than a full-blown program-verification cal- 
culus, the type system is effective and easy to apply. We emphasize a 
core, first-order type system focused on race conditions; we also consider 
extensions with polymorphism, existential types, and a partial order on 
lock types. 



1 Races, Locks, and Types 

Programming with multiple threads introduces a number of pitfalls, such as 
race conditions and deadlocks. A race condition is a situation where two threads 
manipulate a data structure simultaneously, without synchronization. Race con- 
ditions are common, insidious errors in multithreaded programming. They often 
lead to unintended nondeterminism and wrong results. Moreover, since race con- 
ditions are timing-dependent, they are notoriously hard to track down. Attempts 
to eliminate race conditions by using lock-based synchronization can introduce 
other errors, in particular deadlocks. A deadlock occurs when no thread can 
make progress because each is blocked on a lock held by some other thread. 

In practice, both race conditions and deadlocks are often avoided through 
careful programming discipline [5]. Race conditions are avoided by protecting 
each shared data structure with a lock, and accessing the data structure only 
when the protecting lock is held. Deadlocks are avoided by imposing a strict 
partial order on locks and ensuring that each thread acquires locks only in in- 
creasing order. However, this programming discipline is not well supported by 
existing development tools. It is difficult to check if a program adheres to this 
discipline, and easy to write a program that does not by mistake. A single unpro- 
tected access in an otherwise correct program can produce a timing-dependent 
race condition whose cause may take weeks to identify. 
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In this paper we show that this programming discipline can be captured 
through a set of static rules. We present those rules as a type system for a 
concurrent, imperative language. We initially consider a first-order type system 
focused on race conditions. The type system supports dynamic thread creation 
and the dynamic allocation of locks and reference cells. We then consider exten- 
sions, such as universal and existential types, that increase the expressiveness of 
the system. We also outline an extension that eliminates deadlock by enforcing 
a strict partial order on lock acquisitions. 

Since the programming discipline dictates that a thread can access a shared 
data structure only when holding a corresponding lock, our type systems provide 
rules for proving that a thread holds a given lock at a given program point. The 
rules rely on singleton lock types. A singleton lock type is the type of a single 
lock. Therefore, we can represent a lock I at the type level with the singleton 
lock type that contains I, and we can assert that a thread holds I by referring to 
that type rather than to the lock 1. The type of a reference cell mentions both 
the type of the contents of the cell and the singleton lock type of the lock that 
protects the cell. Thus, singleton lock types provide a simple way of injecting 
lock values into the type level. 

A set of singleton lock types forms a permission. During typechecking, each 
expression is analyzed in the context of a permission; including a singleton lock 
type in the permission amounts to assuming that the corresponding lock is held 
during evaluation of the expression. In addition, a permission decorates each 
function type and each function definition, representing the set of locks that 
must be held before a function call. 

We study typechecking rather than type inference, so we do not show how 
to infer which lock protects a reference cell or which permission may decorate 
a function definition. We simply assume that the programmer can provide such 
information explicitly, and leave type inference as an open problem. 

There is a significant body of previous work in this area, but most earlier 
approaches are either unsound (i.e., do not detect all race conditions) [22], deal 
only with finite state spaces [7,10,12], or do not handle mainstream shared- 
variable programming paradigms [1,16]. In contrast, we aim to give a sound type 
system for statically verifying the absence of race conditions in a programming 
language with shared variables. We defer a more detailed discussion of related 
work to section 7. 

The next section describes a first-order type system for a concurrent, im- 
perative language. Section 3 presents the operational semantics of the language, 
which is the basis for the race-freeness theorem of section 4. Section 5 extends 
the type system with universal and existential types. Section 6 further extends 
the type system in order to prevent deadlocks. Section 7 discusses related work. 
We conclude with section 8. For the sake of brevity, we omit proofs. Moreover, 
we give the type systems of sections 5 and 6 without corresponding operational 
semantics and correctness theorems. The operational semantics are straight- 
forward. To date, we have studied how to extend the correctness theorem of 
section 4 to the type systems of section 5, but only partially to that of section 6. 
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V G Value = c \ X \ \^x:t. e 
c G Const = unit 
x,y € Var 

e G Exp = V 
I e e 

I ref^e \ \e \ e := e 
I fork e 

I new-lock X : m in e 
I sync e e 



s,t & Type = Unit 

\ t->^t 
I Ref^t 
I m 

m,n,o £ Type Var 

p,q £ Permission = V{TypeVar) 



Fig. 1. A concurrent, imperative language. 



2 First-order Types against Races 

We start by considering a first-order type system focused on race conditions, 
and defer deadlock prevention to section 6. We formulate our type system for 
the concurrent, imperative language described in figure 1. The language is call- 
by- value, and includes values (constants, variables, and function definitions), 
applications, and the usual imperative operations on reference cells: allocation, 
dereferencing, and assignment. Although the language does not include explicit 
support for recursion, recursive functions can be encoded using reference cells, 
as described in section 2.2 below. 

The language allows multithreaded programs by including the operation 
fork e which spawns a new thread for the evaluation of e. This evaluation is 
performed only for its effect; the result of e is never used. Locks are provided 
for thread synchronization. A lock has two states, locked and unlocked, and 
is initially unlocked. The expression new-lock x : m in e dynamically allocates 
a new lock, binds x to that lock, and then evaluates e. It also introduces the 
type variable m which denotes the singleton lock type of the new lock. The 
expression sync Ci is evaluated in a manner similar to Java’s synchronized 
statement [14]: the subexpression e\ is evaluated first, and should yield a lock, 
which is then acquired; the subexpression C 2 is then evaluated; and finally the 
lock is released. The result of is returned as the result of the sync expression. 
While evaluating C 2 , the current thread is said to hold the lock, or, equivalently, 
is in a critical section on the lock. Any other thread that attempts to acquire the 
lock blocks until the lock is released. Locks are not reentrant; that is, a thread 
cannot reacquire a lock that it already holds. A new thread does not inherit 
locks held by its parent thread. 



2.1 The Type Rules 

The type of an expression depends on a typing environment E, which maps 
program variables to types, and maps type variables to the kind Lock (the kind 






94 



Cormac Flanagan and Martin Abadi 



Judgments 

E\- o E is a, well-formed typing environment 

E\- t t is a well-formed type in E 

E \- p p is a well-formed permission in E 

E \- s <■. t sisa subtype of t in 

E \- p <•. q pisa subpermission of 5 in 

E;p\-e:t eisa well-typed expression of type t in E with p 

Rules 



E\-t 


X ^ dom{E) 


E, 


,X'.t\-<> 


E\-o 


m ^ dom{E) 


E, m'.'.Lock h 0 




E\-o 


E h Unit 


Eh s 


E \- t E \- p 


E 


hs^^t 


Eh 


t E\- m 


~E 


h Ref^t 


E,m:\ 


: Lock, E' \- 0 


E,m:: 


Lock, E \- m 




Eho 


E \- m 


for all m e p 




Ehp 


E^p 


E \- q p Q q 



E \- p <: q 

E\-t 
E \- t t 

E \- s\ K: ti 
E \- t 2 <: S2 
E\- p <■. q 

E h (ti t 2 ) <: (si 



(Env 0 ) 

(Env x) 

(Env m) 
(Type Unit) 
(Type Fun) 
(Type Ref) 
(Type Lock) 

(Perm) 
(Subperm) 
(Sub Refl) 

(Sub Fun) 



E\-o 

E -,0 \- unit : Unit 

E, X : t, E' \- o 
E,x : t, E' -,0 \- X : t 

E,x : S', p\- e : t 
E ; 0 h X -.s.e-.s t 

E\p\- e\ ■. s t 
E ;p h 62 '■ s 
E ;p h 6i 62 '■ t 

E\- m E -jph e : t 
E-,p\- ref^e : Ref^t 

E-p\- e : Ref^t m€p 
E ;p h!6 : t 

E;p h 61 : Ref^t 
E -jpi- €2 '■ t m € p 
E ;p h 61 := 62 : Unit 

E ;0 h 6 : t 
E ; 0 h fork e : Unit 

E,m::Lock,x : m;p\- e :t 
E \~ p E \~ t 

E ; p h new-lock x\m in e : t 

E ;p h 61 : m 
E ;p U {m} h 62 : t 
E ;p h sync 61 62 : t 

E ;p h 6 : t 

E\- p <: q E\- t <: s 
E ; g h 6 : s 



Fig. 2. The first-order type system. 
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of singleton lock types). The typing environment is organized as a sequence of 
bindings, and we use 0 to denote the empty environment. 

E \ E , X ■. t \ E \ Lock 

We define the type system using six judgments. These judgments are de- 
scribed in figure 2, together with rules for reasoning about the judgments. The 
core of the type system is the set of rules for the judgment E]p \- e : t (read 
“e is a well-typed expression of type t in typing environment E with permis- 
sion p”). Our intent is that, if this judgment holds, then e is race-free and yields 
values of type t, provided the current thread holds at least the locks described 
by p, and the free variables of e are given bindings consistent with the typing 
environment E. 

The rule (Exp Fun) for functions X^x : s. e checks that e is race- free given 
permission p, and then records this permission as part of the function’s type: 
s — t. The rule (Exp Appl) ensures that this permission is available at each call 
site of the function. The rule (Exp Ref) records the singleton lock type of the 
protecting lock as part of each reference-cell type: Ref^^t. The rules (Exp Deref) 
and (Exp Set) ensure that this lock is held (i.e., is in the current permission) 
whenever the reference cell is accessed. A single lock may protect several refer- 
ence cells; in an obvious extension of our language, it could protect an entire 
record or object. 

The rule (Exp Fork) typechecks the spawned expression using the empty per- 
mission, since threads never inherit locks from their parents. The rule (Exp Lock) 
for new-lock x:m in e requires the type t of e to be well-formed in the original 
typing environment {E \- t). This requirement implies that t cannot contain the 
type variable m, and hence the new-lock expression cannot return the newly al- 
located lock. This constraint suffices to ensure that different singleton lock types 
of the same name are not confused. It is somewhat restrictive, but it can be 
circumvented with existential types, as described in section 5.2 below. 

The rule (Exp Sync) for sync ei 62 requires that ei yield a value of some 
singleton lock type m, and then typechecks €2 with an extended permission that 
includes the type of the newly acquired lock. The use of this synchronization 
construct ensures that lock acquisition and release operations follow a stack-like 
discipline, which significantly simplifies the development of the type system. 

The rule (Exp Sub) allows for subsumption on both types and permissions. 
If E \- p <: q, then any expression that is race- free with permission p is also 
race-free with the superset q of p. 



2.2 Examples 

For clarity, we present example programs using an extended language with in- 
tegers, ^et-expressions, and a sequential composition operator (;). The program 
Pi is a trivial example of using locks; it first allocates a lock and a reference 
cell protected by that lock, and then it acquires the lock and dereferences the 
cell. The program P 2 is slightly more complicated. It first allocates a lock and 
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defines a function g that increments reference cells protected by that lock. It 
then allocates two such reference cells, and uses g to increment both of them. 
The type of g is {{Ref^Int) Int); it expresses that the protecting lock 

should be acquired before g is called. 

Pi = new-lock X : m in P2 = new-lock x : m in 

lety=ref^lin let g = z: Ref^Int. z :=\z + 1 

sync x\y j/i = re /^1 

V2 = re/^2 
in sync x {gyi,g t/2) 

Although the language does not include explicit support for recursion, we can 
encode recursion using reference cells. This idea is illustrated by the following 
program, which implements a server that repeatedly handles incoming requests. 
The core of this server is a recursive function that first allocates a new lock X2 
and associated reference cell j/2, then uses j/2 in handling an incoming request, 
and finally calls itself recursively to handle the next incoming request. 



new-lock X \ : mi in 

let yi = ref^^ (A®a;: Unit, x) in 
sync x\ 

yi := X^x: Unit. 

new-lock X2 ■ m2 in 
let y2 = re /^^0 in 

{lyi unit)] 

{\yi unit) 



; Allocate a lock and a ref cell 
; initialized to the identity function. 

; Acquire the lock and set the ref cell 
; to the recursive function. 

; Allocate a local lock 
; and a local ref cell. 

; Use the local ref cell. 

; Call the recursive function. 

; Start the server running. 



The type variable m2 denotes different singleton lock types at different stages of 
the execution, but these different types are not confused by the type system. 



2.3 Expressiveness 

Although the examples shown above are necessarily simple, the first-order type 
system is sufficiently expressive to verify a variety of non-trivial programs. In 
particular, any sequential program that is well- typed in the underlying sequen- 
tial type system has an annotated variant that is well-typed in our type sys- 
tem. This annotated variant is obtained by enclosing the original program in 
the context new-lock x : m in sync x [ ] (which allocates and acquires a new 
lock of type m), annotating each reference cell with the type m, and annotat- 
ing each function definition with the permission {m}. This approach can be 
generalized to multithreaded programs with a coarse locking strategy based on 
several global locks. It also suggests how to annotate thread-local data: writing 
fork {new-lock x : m in sync x e) for spawning e and using x as the lock for 
protecting thread-local reference cells in e. 

The type system does have some significant restrictions: functions cannot 
abstract over lock types, and the new-lock construct cannot return the newly 
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allocated lock; these restrictions are overcome in section 5. In addition, our type 
system is somewhat over-conservative in that it does not allow simultaneous 
reads of a data structure, even though simultaneous reads are not normally 
considered race conditions, and in fact many programs use reader-writer locks 
to permit such simultaneous reads [5]. We believe that adding a treatment of 
reader- writer locks to our type system should not be difficult. 

3 Operational Semantics 

We specify the operational semantics of our language using the abstract machine 
described in figure 3. The machine evaluates a program by stepping through a 
sequence of states. A state consists of three components: a lock store, a reference 
store, and a collection of expressions, each of which represents a thread. The 
expressions are written in a slight extension of the language Exp, called Exp^, 
which includes the new construct in-sync. Since the result of the program is the 
result of its initial thread, the order of the threads in a state matters; therefore, 
we organize the threads as a sequence, and the first thread in the sequence is 
always the initial thread. We use the notation Ti to mean the ith element of a 
thread sequence T, where the initial thread is at index 0, and we use T.T' to 
denote the concatenation of two sequences. 

Reference cells are kept in a reference store a, which maps reference locations 
to values. Locks are kept in a lock store tt, which maps lock locations to either 

0 or 1; tt{1) = 1 when the lock I is held by some thread. Reference locations 
and lock locations are simply special kinds of variables that can be bound only 
by the respective stores. For each lock location I, we introduce a type variable 
0 / to denote the corresponding singleton lock type. A lock store that binds a 
lock location I also implicitly binds the corresponding type variable Oi with kind 
Lock] the only value of type oi is L 

The evaluation of a program starts in an initial state with empty lock and 
reference stores and with a single thread. Evaluation then takes place according 
to the machine’s transition rules. These rules specify the behavior of the various 
constructs in the language. The evaluation terminates once all threads have been 
reduced to values, in which case the value of the initial thread is returned as the 
result of the program. We use the notation e[T/a;] to denote the capture-free 
substitution of V for x in e, and use a[r ^ V] to denote the store that agrees 
with a except at r, which is mapped to V. 

The transition rules are mostly straightforward. The only unusual rules are 
the ones for lock creation and for sync expressions. To evaluate the expression 
new-lock x:m ine, the transition rule (Trans Lock) allocates a new lock location 

1 and replaces occurrences of a; in e with 1. The rule also replaces occurrences of 
m in e with the type variable o/. To evaluate the expression sync I e, the transi- 
tion rule (Trans Sync) acquires the lock I and yields the term in-sync I e. This 
term denotes that the lock I has been acquired and that the subexpression e is 
currently being evaluated. Since in-sync ^ [ ] is an evaluation context, subsequent 
transitions evaluate the subexpression e. Once this subexpression yields a value. 





Cormac Flanagan and Martin Abadi 



Evaluator 



eval C Exp x Value 

eval{e,V) <S=^ (0,0, e)i — >* {n,cr,V.unit.- ■ ■ .unit) 



State space 



S € State = LockStore x RefStore x ThreadSeq 
7T e LockStore = LockLoc {0, 1} 
cr G RefStore = RefLoc Value 
I G LockLoc C Var 
r G RefLoc C Var 
T G ThreadSeq = Exp* 

f G Exp^ = V \ f e\V f \ ref^f | !/ | / := e | r 
I forke I new-lock X : m in e 
I sync / e I in-sync I f 



Evaluation contexts 



5 = [ ] I 5 e I 1/ f I ref^S | | f := e | r := 5 

I sync £ e \ in-sync I £ 



Transition rules 

(7r,cr,T.£-[ {X^x-.t. e) V ].T') i — > < 

{-x,a,T.£[ref^V].T') 

i 

(7T,a,T.f[ !r ].T') 

i 

(7T,cr,T.£-[ r := F ].T') i— > < 

{■K,a,T.£[ new-lock x:m ine].T') 

i 

{n[l 0], cr, T.£[ sync I e ].T') 

(7t[Z I— > l],cr, T.5[ in-sync I V ].T') 
{w,a,T.£[forke].T') i — H 



{w,a,T.£[e[V/x] ].T') 




(Trans Appl) 


(tt, cr[r V],T.£[ r ].T') 
if r ^ dom{a) 




(Trans Ref) 


(n,a,T.£[V].r) 
if cr(r) = V 




(Trans Deref) 


{TT,(j[r ^ V],T.£\ unit].T') 




(Trans Set) 


) 

{tt[1 0], cr, T.£[ e[l/x, oi/m] 

if 1 ^ dom{ir) 


].T') 


(Trans Lock) 


{-k[1 1 ^ l],cr, T.5[ in- sync 1 e 


].T') 


(Trans Sync) 


) 

{7T[l^0],a,T.£[V].T') 




(Trans In-Sync) 


{n,a,T.£[ unit].T'.e) 




(Trans Fork) 



Fig. 3. The abstract machine. 
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J udgment 

\- S : t S is a well-typed state of type t 

Rules 



dom(-K) = {h, . . . ,lj} dom{a) = {n, . . . ,rfc} 

E = Oil ■■Lock, h : Oil, , % ■■'■Lock, Ij : oi^, n : Ref^,^si, . . . ,rk : Ref^^Sk 
Vi € l..k. £1 ; 0 h o{ri) : Si 

|T|>0 Vi < |r|.g;0hTi 

h (tt, (T, T) : to 

E \- I ■. m E\{p\J {m}) f -t 
E\p\- in- sync I f '■ t 

Fig. 4. Additional judgment and rules for typing states. 



the transition rule (Trans In-Sync) releases the lock and returns that value as 
the result of the original sync expression. We say that an expression / is in a 
critical section on a lock location I if f = £[ in- sync I f ] for some evaluation 
context £ and expression f . 

The machine arbitrarily interleaves the execution of threads. Since different 
interleavings may yield different results, the evaluator eval is a proper relation 
and not simply a partial function. 

We use the semantics to formalize the notion of a race condition. An expres- 
sion / accesses a reference location r if there exists some evaluation context £ 
such that f — £[\r] or f — £[r :=V ]. A state has a race condition if its thread 
sequence contains two expressions that access the same reference location. A 
program e has a race condition if its evaluation may yield a state with a race 
condition, that is, if there exists a state S such that (0, 0, e) i — >* S and S has a 
race condition. 

4 Well-typed Programs Don’t Have Races 

The fundamental property of the type system is that well-typed programs do 
not have race conditions. The first component of the proof of this property is a 
subject reduction result stating that typing is preserved during evaluation. To 
prove this result, we extend typing judgments from expressions in Exp to expres- 
sions in Exp^ and then to machine states as shown in figure 4. The judgment 
\- S -.t says that S' is a well-typed state yielding values of type t. 

Lemma 1 (Subject Reduction). IfhS:t and S i — > S' , then \~ S' : t. 

Independently of the type system, locks provide mutual exclusion, in that two 
threads can never be in a critical section on the same lock. The judgment hcs S 
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Judgments 



/ / has exactly one critical section for each lock in M 

hcs S S is well-formed with respect to critical sections 

Rules 



f = V \ fork e \ new-lock x:m in e 

0 hcs / 



■Mhc. / 

Ai W {/} hcs in-sync I f 



M he, / 

f = fe\V f\ reU \ !/ 

I / := e I r := / I sync f e 

mK77 



Vi < \T\. Mi he. Ti 
]v{ = A4q ttJ • • • ttJ A^\t\—i 
V/ G M. 7t(/) = 1 
he. (7T,cr, T) 



Fig. 5. Additional judgments and rules for reasoning about critical sections. 



says that at most one thread is in a critical section on each lock in S (see figure 5). 
According to Lemma 2, the property h^ S is maintained during evaluation. 

Lemma 2 (Mutual Exclusion). If h^ S and S i — > S' , then he. S' . 

Lemma 3 says that a well-typed thread accesses a reference cell only when it 
holds the protecting lock. 

Lemma 3. Suppose that E ; 0 h / : and f accesses reference location r. Then 

E ; 0 h r : Ref^s for some lock type m and type s. Furthermore, there exists lock 
location I such that A ; 0 h i : m and f is in a critical section on 1. 

This lemma implies that states that are well-typed and well-formed with 
respect to critical sections do not have race conditions. 

Lemma 4. Suppose h S' : t and he. S. Then S does not have a race condition. 

We conclude that well- typed programs do not have race conditions. 
Theorem 1. // 0 ; 0 h e : t then e does not have a race condition. 



5 Second-order Types against Races 

Although the first-order type system of section 2 is applicable to a variety of 
multithreaded programs, there are many race- free programs that it cannot verify. 
This section describes extensions that allow the verification of more complex 
programs. These extensions rely on polymorphism and type abstraction, which 
are fairly easy to incorporate into a type-based approach such as ours. 






Types for Safe Locking 101 



Extended syntax 



Additional rules 



V G Value = 
e G -Exp = 
s,t € Type = 



Am:: Lock. V 
e[m] 

Vm:: Lock, t 



E, m::Lock h t 
E h Vm:: Lock, t 



E, m::Lock ; 0 h V' :t 
E ; 0 h Am:: Lock. V : (Vm:: Lock, t) 



E, m::Lock \- s <:t 
E h {Vm:: Lock, s) <: {Vm:: Lock, t) 



E -,p e : {Vm:: Lock, t) E n 
E -jp}- e[n] : t[n/m] 



Fig. 6 . Extending the first-order type system with universal types. 



5.1 Polymorphism over Lock Types 

The first-order type system does not permit functions parameterized by lock 
types. To overcome this limitation, we extend the type system to include poly- 
morphism, as described in figure 6. The only unusual aspect of this extension 
is that we require the body of a polymorphic abstraction to be a value. This 
restriction avoids the need to annotate polymorphic abstractions with permis- 
sions, since the trivial evaluation of a value requires only the empty permission. 
(If needed, we can still include a non- value expression in a polymorphic abstrac- 
tion by wrapping the expression in a function definition.) 



Examples The following program P 3 defines a polymorphic function for incre- 
menting a reference cell. The function abstracts over both the reference cell and 
the type of the lock protecting the reference cell, and the caller is responsible for 
acquiring that lock. The program P4 is similar, except that the lock is acquired 



inside the increment function. 

P3 = let g = An:: Lock. P4 

\i^h:RefJnt. 
z ;=!2-G 1 
in new-lock x:m in 
let y = re/^0 in 
sync X {g[m\ y) 



= let g = An:: Lock. 

X^w : n. 

\^z:RefJnt. 

sync w (z :=!z + 1) 
in new-lock x:m in 
let y = re/^0 in 
g[m] X y 



5.2 Existential Quantification over Lock Types 

All our type systems require that the result type of new-lock x:m in e he a, well- 
formed type in the environment of the new-lock expression. This requirement 
forbids returning the type variable m out of the scope of its binding new-lock 
expression, and hence unfortunately excludes some useful programming patterns. 
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Extended syntcix 



V G Value = 
e G -Exp = 
s,t € Type = 



Additional rules 

E, m::Lock \~ t 
E h 3m'.'. Lock, t 



I pack m'.'.Lock = n with V 
I open e as m:: Lock, x : t in e 
I 3m:: Lock, t 



E, m::Lock \- s <:t 
E h (3m:: Lock, s) <: (3m:: Lock, t) 



E\- n -E ; 0 h V[n/m] : t[n/m] 

E ; 0 h pack m::Lock = n with V : (3m:: Lock, t) 



E ;p h ei : (3m:: Lock, t) 

E,m:: Lock, x :t',p\~ C 2 : s 
Eh s 

E ; p h open e\ as m:: Lock, x : t in C 2 : s 



Fig. 7. Extending the first-order type system with existential types. 



For example, consider a multithreaded implementation of binary trees. To reduce 
lock contention, the implementation may protect each node with a separate lock. 
The node allocation routine thus needs to create a fresh lock, say of type m, and 
return a new node of type m x Ref^^a x Ref^^a, where a is the type of the node’s 
children. But including m in the return type implies lifting it out of its binding 
new-lock expression, and is forbidden by the type system. 

We circumvent this restriction by noting that the caller of the allocation 
routine does not care which singleton lock type is used to protect the node. The 
caller requires only that there exist some lock type m such that the node has type 
m X Refj^a x Ref^a. This insight suggests the use of existential types for typing 
such programs. It is straightforward to extend the type system with existential 
types, as outlined in figure 7. The type rules closely follow the conventional 
rules for existential types [6]. In the rule for pack, the lock type n is hidden and 
replaced with the type variable m in the resulting existential type {3m:: Lock. t). 
We do not explicitly allow for renaming in the rule for open, since renaming can 
be accomplished using a-conversion, if necessary. 



Examples Some of the following examples use product, sum, and recursive 
types, which are easily added to the type system, as in [6]. Values of these 
additional types are manipulated by the following operations: (ei , . . . , e„) creates 
a value of a product type, whose components are retrieved by the operations /irst, 
second, etc.; inLeft creates a value of a sum type, whose component is retrieved 
by asLeft', inRight and asRight behave in a similar manner; and fold and unfold 
convert between a recursive type /ra. t and its unfolding t[{p,a. t)/a\. 
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The following expression P 5 provides a simple example of using existential 
types. This expression has type 3m:: Lock, (m x Ref^Int), and it returns a pair 
consisting of a lock and a reference cell protected by that lock. The expression 
Pq opens P 5 and retrieves the value of the reference cell. 

P 5 = new-lock X : n in Pq = open P^ as 

let y = {x, (re/„0)) in m::Lock, y : {m x Ref^^Int) in 

pack m:: Lock = n with y sync firstly) \second{y) 

For a more realistic example, we reconsider how to implement binary trees 
using a separate lock to protect each node. In this implementation, a leaf node 
is represented as unit, and an interior node is represented as a triple, using an 
existential type to hide the type of the protecting lock. The type of a binary tree 
is thus: 

T = ya. {Unit+ 3m:: Lock, (m x Ref^a x Pe/^a)) 

Some typical routines for manipulating binary trees are: 

leaf :T = fold{inLeft{unif)) 

alloc-node :T {T T) = X^L.T. 

\^r:T. 

new-lock x:n in 

pack m : : Lock — n with 
fold{inRight{{x, refj, re/„r))) 

left-child :P^®P = A®x:r. 

open as Right {unfold (x)) 

as m:: Lock, y : (m x Ref^^T x Ref^T) 
in sync first {y) \second{y) 



6 Preventing Deadlocks 

The type systems described so far ensure that well-typed programs do not have 
race conditions. However, these programs may still suffer from other errors, in 
particular deadlocks. 

We formalize the notion of deadlock using the operational semantics of fig- 
ure 3. An expression / requests a lock I if f = S[ sync I e]. A state is deadlocked 
if there is a cycle of threads in the state such that each thread requests a lock 
held by the next thread in the cycle. More precisely, a state S = (tt, a, T) is 
deadlocked if there exist lock locations Iq, ..., In and indices do, ... , dn-i of T 
such that n > 0, Iq = In, and for each 0 < z < n, thread is in a critical 
section on k and requests lock k+i. A program e may deadlock if its evaluation 
may yield a deadlocked state, that is, if there exists a deadlocked state S such 
that (0, 0, e) 1 — >* S. 
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Judgments (in addition to those of figure 2 ) 


E h m::(mi,m2) m is in the interval (mi, m2) in E 

E h mi X m2 mi is less than m2 in E 

E h (rni,m2) (mi, 77x2) is a well- formed, non-empty interval in E 


Rules (partial list) 




E\- <> m ^ dom(E) 

E h (mi, m2) 


E\-o 

Vmi G mi. Vm2 G m2. F h mi ^ m2 


E, m:: (mi, m2) F ^ 


F h (mi, m2) 


E,m:: (mi , m2 ) , F' h 0 


mi C m2 


E,m-.-. (mi, m2), F' h m:: (mi, m2) 

E 0 F h m :: (mi, m2) 
mi G mi 


F h J/2 A ii or J/i = J/2 
F h (mi, Li) <: (m2, L2) 

F ; (m, L) h e : Ref^t m £m 


F h mi X m 


E ; (m, L) h!e : t 


F h 0 F h m :: (mi, m2) 
m2 G m2 


E ; (m, L) h ei : Ref^t 
E ; (m, L) \- 62 '■ t m £m 


h m ^ m2 


E ; (m, L) h ei := 62 : f/mt 


h 0 h m 


F;( 0 ,_L)he:t 


E \- m ^ T 


E ; ( 0 , T) h fork e : Unit 


E \- 0 E \- 771 

E h E ^ m 


E, m:: (Wii ,m2) , X : m-,p\- e :t 
E p E \~ t 


E h mi X m F h m X m2 


E\p\- newRock x\m-.-.(mi,m2) in e : t 


E \- mi E m2 


E ; (m, L) \- 6i : m E \- L ^ m 

E ; (m U {m}, m) h 62 : t 
F ; (m, L) h sj/nc ei 62 '■ t 



Fig. 8. Extending the type system for deadlock elimination (highlights). 
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In practice, deadlocks are commonly avoided by imposing a strict partial 
order on locks, and respecting this order when acquiring locks [ 5 ]. We capture 
this discipline by embodying it in an extension of our type system. 

Our extended type system relies on annotations that specify a lock ordering. 
Whenever we introduce a singleton lock type, we must specify an appropriate 
order between that lock type and the other lock types in the program. If mi and 
m2 are sets of lock types, then we use the notation m:: (mi, m2) to mean that 
the lock type m is greater than each lock in mi and is less than each lock in m2 . 
Thus the interval (mi, m2) specifies a kind. In the extended language, we use 
kinds of the form (mi, m2) instead of the kind Lock: 

V G Value = .. . I ylm:: (mi, m2) W 

I pack m : : (mi , m2 ) = n with V 
e £ Exp =... I newAock x\m\\{mi,m2) in e 

I open e as m:: (mi , m2) , x : t in e 
s,t G Type = ... | 3 m:: (mi, m2), t 
I Vm:: (mi, m2), t 

The type system ensures that locks are acquired in the appropriate order 
using the notion of a locking level. A locking level L is either a particular lock 
type, in which case any greater lock can be acquired, or T, in which case all 
locks can be acquired, or T, in which case no lock can be acquired. We extend 
permissions to include a locking-level component, so a permission is a pair of 
a lock set and a locking level. The trivial, empty permission is ( 0 ,T), and the 
initial permission (of forked threads and of the main program) is ( 0 , T). 

We extend the typing environment E to map type variables to intervals 
(mi, m2). The judgment A h mi ^ m2 expresses that mi is less than m2; the 
judgment E h (mi, m2) expresses that (mi, m2) is a well-formed, non-empty 
interval; and the judgment E \- m-.-. (mi, m2) expresses that the lock type m is 
in the interval (mi, m2). In a well-formed environment, the ordering constraints 
on lock variables induce a strict partial order. 

The necessary modifications to the type rules are outlined in figure 8 . Most 
of the rules are straightforward adaptations of earlier rules. The rule for fork 
initializes a newly spawned thread with the locking level T, since that thread 
is free to acquire any lock. The rule for sync ensures that locks are acquired in 
increasing order. Collectively, the type rules check that threads respect the strict 
partial order on locks, as required by the discipline for preventing deadlocks. 

7 Related Work 

Race conditions and deadlocks have been studied for decades, for example in 
the program- verification literature. In this section, we mention some of the work 
most closely related to ours. 

Warlock [ 22 ] is a system for detecting race conditions and deadlocks statically. 
Its goals are similar to those of our type system. The major differences are that 
Warlock is an implemented system applicable to substantial programs, and that 
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Warlock may fail to detect certain race conditions. This unsoundness is partly 
due to the target language (ANSI C), and partly due to two other difficulties that 
are overcome by our system. First, Warlock works by tracing execution paths, 
but it fails to trace paths through loops or recursive function calls. Second, 
Warlock appears to merge different locks of the same type, and so may fail to 
detect inconsistent locking. 

Aiken and Gay [2] also investigate static race detection, in the somewhat 
different setting of SPMD programs. They present a system that has been used 
successfully on a variety of SPMD programs. Synchronization in these programs 
is performed using barriers. Since a barrier is a global operation not associated 
with any particular variable in the program, they do not develop machinery for 
tracking the association between reference cells and their protecting locks. 

A number of analyses have been developed for concurrent languages such as 
CML [20]. Nielson and Nielson [19] present an analysis that predicts process and 
channel utilization and uses this information for optimization. Their analysis is 
based on the notion of behaviors, which are similar to our permissions. Colby [9] 
also presents an analysis that infers information about channel usage. Neither 
work treats race conditions or deadlocks. 

Kobayashi [16] presents a first-order type system for a process calculus. His 
type system has a deadlock-free subset, and uses the notion of time tags, which 
are similar to our locking levels. Although Kobayashi considers some sophisti- 
cated determinism properties, he does not address race conditions directly. 

Abramsky, Gay, and Nagarajan [1] present another type-based technique for 
avoiding deadlocks. Their work is based on interaction categories inspired by 
linear logic. It emphasizes issues of type structure, rather than their application 
to a specific programming language. 

Dwyer and Clarke [11] describe a data-flow analysis for verifying certain 
correctness properties of concurrent programs, for example mutual exclusion on 
particular resources. The authors suggest that their analysis is not well suited for 
detecting global properties such as deadlock. Avrunin et al. [4] describe a toolset 
for analyzing concurrent programs. This toolset has been used for detecting race 
conditions and deadlocks in a variety of benchmarks, on a case- by-case basis. 

Savage et al. [21] describe Eraser, a tool for detecting race conditions and 
deadlocks dynamically (rather than statically, as in our method) . Although quite 
effective. Eraser may fail to detect certain race conditions and deadlocks because 
of insufficient test coverage. In general, static checking and testing are comple- 
mentary, and they should both be used in the development of reliable software. 
Hybrid approaches (like that of the Gilk Determinator [8]) seem promising. 

There is a large amount of work on model-checking of concurrent programs, 
particularly focused on finite-state systems {e.g., [7,10,12]). Recently, Godefroid 
has applied model-checking techniques to G programs [13]; his approach, state- 
less state-space exploration, relies on dynamic observation rather than static 
analysis. 

The permissions that we use are similar to effects [15,17,18] in that the per- 
mission of an expression constrains the effects that it may produce. Much work 
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has been done on effect reconstruction [3,23,24,25]. It may be possible to adapt 
these inference methods in our setting in order to remove the need for explicit 
lock annotations. 

8 Conclusions 

This paper describes how a type system can be used for avoiding two major 
pitfalls of multithreaded programming, namely race conditions and deadlocks. 
Our approach requires annotating programs with locking information. We be- 
lieve that this information is usually known to competent programmers and 
often implicit in documentation. However, it may be worthwhile to investigate 
algorithms for inferring the annotations. Such algorithms would be helpful in 
tackling larger examples and in extending our techniques to programming lan- 
guages more realistic than the one treated in this paper. Also helpful would be 
a mechanism for escaping from our type system when it proves too restrictive. 
We leave those issues for further work. 

For sequential languages, standard type systems provide a means for express- 
ing and checking fundamental correctness properties. We hope that type systems 
such as ours will play a similar role in the realm of multithreaded programming. 
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Abstract. Constructor subtyping is a form of subtyping in which an 
inductive type a is viewed as a subtype of another inductive type r if r 
has more constructors than a. As suggested in [5,12], its (potential) uses 
include proof assistants and functional programming languages. 

In this paper, we introduce and study the properties of a simply typed 
A-calculus with record types and datatypes, and which supports record 
subtyping and constructor subtyping. In the first part of the paper, we 
show that the calculus is confluent and strongly normalizing. In the sec- 
ond part of the paper, we show that the calculus admits a well-behaved 
theory of canonical inhabitants, provided one adopts expansive exten- 
sionality rules, including ? 7 -expansion, surjective pairing, and a suitable 
expansion rule for datatypes. Finally, in the third part of the paper, we 
extend our calculus with unbounded recursion and show that confluence 
is preserved. 



1 Introduction 

Type systems [3,8] lie at the core of modern functional programming languages, 
such as Haskell [28] or ML [26], and proof assistants, such as Coq [4] or PVS [32]. 
In order to improve the usability of these languages, it is important to devise 
flexible (and safe) type systems, in which programs and proofs may be written 
easily. A basic mechanism to enhance the flexibility of type systems is to endorse 
the set of types with a subtyping relation < and to enforce a subsumption rule 

a : A A < B 
a : B 

This basic mechanism of subtyping is powerful enough to capture a variety of 
concepts in computer science, see e.g. [9], and its use is spreading both in func- 
tional programming languages, see e.g. [25,30,31], and in proof assistants, see 
e.g. [7,24,32]. 

Constructor subtyping is a basic form of subtyping, suggested in [12] and de- 
veloped in [5] , in which an inductive type a is viewed as a subtype of another in- 
ductive type r if r has more constructors than a. As such, constructor subtyping 
captures in a type-theoretic context the ubiquitous use of subtyping as inclusion 
between inductively defined sets. In its simplest instance, constructor subtyping 
enforces subtyping from odd or even numbers to naturals, as illustrated in the 
following example, which introduces in a ML-like syntax the mutually recursive 
datatypes Odd and Even, and the Nat datatype: 
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datatype Nat = 0 

I s of Nat 

I s of Odd 

I s of Even ; 

Here Even and Odd are subtypes of Nat (i.e. Even < Nat and Odd < Nat), since 
every constructor of Even and Odd is also a constructor of Nat. 

In a previous paper [5], the first author introduced and studied constructor 
subtyping for one first-order mutually recursive parametric datatype, and showed 
the calculus to be confluent and strongly normalizing. In the present paper, we 
improve on this work in several directions: 

1. we extend constructor subtyping to the class of strictly positive, mutually re- 
cursive and parametric datatypes. In addition, the present calculus supports 
incremental definitions; 

2. following recent trends in the design of proof assistants (and a well-established 
trend in the design of functional programming languages), we replace the 
elimination constructors of [5] by case-expressions. This leads to a simpler 
system, which is easier to use; 

3. we define a set of expansive extensionality rules, including /^-expansion, sur- 
jective pairing, and a suitable expansion rule for datatypes, so as to obtain 
a well-behaved theory of canonical inhabitants (i.e. of closed expressions in 
normal forms) . The latter is fundamental for a proper semantical understand- 
ing of the calculus and for several applications related to proof assistants, 
such as unification. 

The main technical contribution of this paper is to show that the calculus enjoys 
several fundamental meta-theoretical properties including confluence, subject 
reduction, strong normalization and a well-behaved theory of canonical inhabi- 
tants. These results lay the foundations for constructor subtyping and open the 
possibility of using constructor subtyping in programming languages and proof 
assistants, see Section 7. 

Organization of the paper The paper is organized as follows: in Section 2, we 
provide an informal account of constructor subtyping. In Section 3, we introduce 
a simply typed A-calculus with record types and datatypes, and which supports 
both record subtyping and constructor subtyping. In Section 4, we establish 
some fundamental meta-theoretical properties of the calculus. In Section 5, we 
motivate the use of expansive extensionality rules, show that they preserve con- 
fluence and strong normalization and lead to a well-behaved theory of canonical 
inhabitants. In Section 6, we extend our core language with fixpoint operators, 
and show the resulting calculus to be confluent. Finally, we conclude in Section 
7. Because of space constraints, proofs are merely sketched or omitted. We refer 
the reader to [6] for further details. 

Acknowledgments We are grateful to T. Altenkirch, P. Dybjer and L. Pinto 
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datatype Odd = s of Even 

and Even = 0 

I s of Odd ; 
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2 An informal account of constructor subtyping 

Constructor subtyping formalizes the view that an inductively defined set a is 
a subtype of an inductively defined set r if r has more constructors than a. As 
may be seen from the example of even, odd and natural numbers, the relative 
generality of constructor subtyping relies on the possibility for constructors to 
be overloaded and, to a lesser extent, on the possibility for datatypes to be 
defined in terms of previously introduced datatypes. The following example, 
which introduces the parametric datatypes List of lists and NeList of non- 
empty lists, provides further evidence. 

datatype ’a List = nil 

I cons of (’a * ’a List) ; 

datatype ’a NeList = cons of (’a * ’a List) ; 

Here ’a NeList < ’a List since the only constructor of ’a NeList, cons : 
(’a * ’a List) — >’a NeList is matched by the constructor of ’a List, cons 
: (’a * ’a List) — >’a List. 

The above examples reveal a possible pattern of constructor subtyping: for 
two parametric datatypes d and d' with the same arity, we set d < d' if every 
declaration (c in case of a constant, c of B otherwise) of d is matched in d' d 
Another pattern, used in [5], is to take subtyping as a primitive. Here we allow for 
the subtyping relation to be specified directly in the definition of the datatype. 
As shown below, such a pattern yields simpler definitions, with less declarations. 

datatype Odd = s of Even datatype Nat = s of Nat 

and Even = 0 with Odd < Nat, 

I s of Odd ; Even < Nat ; 

The original datatype may be recovered by adding a declaration of the form 
c : (T d' whenever c : cr ^ d and d < d' . The same technique can be used to 
define ’a List and ’a NeList: 

datatype ’a List = nil 

and ’a NeList = cons of (’a * ’a List) 

with ’a NeList < ’a List ; 

For the clarity of the exposition, we shall adopt the second pattern in examples, 
whereas we consider the first pattern in the formal definition of A^jj^data- 

^ For the sake of simplicity, we gloss over renamings and assume the parameters of d 
and d' to be identical. 
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Thus far, the subtyping relation is confined to datatypes. It may be extended 
to types in the usual (structural) way. In this paper, we force datatypes to be 
monotonic in their parameters. Hence, we can derive 

Odd List < Nat List 

[11 : Even, 12 : Nat List, 13 : Odd] < [11 : Nat, 12 : Nat List] 

Nat Even NeList < Odd ^ Nat NeList 

from the fact that Odd < Nat, Even < Nat and ’a NeList < ’a List. The 
formal definition of the subtyping relation is presented in the next section. 

In order to introduce strict overloading, which is a central concept in this 
paper, let us anticipate on the next section by considering the evaluation rule 
for case-expressions. Two observations can be made: first, our informal definition 
of datatype allows for arbitrary overloading of constructors. Second, it is not 
possible to define a type-independent evaluation rule for case-expressions for 
arbitrary datatypes. For example, consider the following datatype, where Sum is 
a datatype identifier of arity 2: 

datatype (’a,’b) Sum = inj of ’a 

I inj of ’b ; 

Note that the datatype is obtained from the usual definition of sum types by over- 
loading the constructors inj\ and inj 2 . Now, a case-expression for this datatype 
should be of the form 

case a of (inj x) => bl I (inj x) => b2 
with evaluation rules 

case (inj a) of (inj x) => bl I (inj x) => b2 ^ bl{x:=a} 

case (inj a) of (inj x) => bl I (inj x) => b2 ^ b2{x:=a} 

As bl and b2 are arbitrary, the calculus is obviously not confluent. Thus one 
needs to impose some restrictions on overloading. One drastic solution to avoid 
non-confluence is to require constructors to be declared at most once in a given 
datatype, but this solution is too restrictive. A better solution is to require 
constructors to be declared “essentially” at most once in a given datatype. Here 
“essentially” consists in allowing a constructor c to be multiply defined in a 
datatype d, but by requiring that for every declaration c of rho, we have rho < 
rhom where c of rhom is the first declaration of c in d. In other words, the only 
purpose of repeated declarations is to enforce the desired subtyping constraints 
but (once subtyping is defined) only the first declaration needs to be used for 
typing expressions. This notion, which we call strict overloading, is mild enough 
to be satisfied by most datatypes that occur in the literature, see [5] for a longer 
discussion on this issue. 

We conclude this section with further examples of datatypes. Firstly, we 
define a datatype of ordinals (or better said of ordinal notations) . Note that the 
datatype is a higher-order one, because of the constructor lim which takes a 
function as input. 
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datatype Drd = s of Drd I lim of (Nat -> Ord) 
with Nat < Ord ; 

Second, we define a datatype of binary integers. These datatypes are part of the 
Coq library, but Coq does not take advantage of constructor subtyping. 

datatype positive = xH I xl of positive I xO of positive ; 

datatype natural = ZERO 

with positive < natural ; 

datatype integer = NEC of positive 

with natural < integer ; 

Thirdly, and as pointed out in [5,12], constructor subtyping provides a suitable 
framework in which to formalize programming languages, including the object 
calculi of Abadi and Cardelli [1] and a variety of other languages taken from [29]. 
Yet another example of language that can be expressed with constructor seman- 
tics is mini-ML [22], as shown below. Here we consider four datatypes identifiers: 
E of expressions, 1 for identifiers, P of patterns and N for the nullpattern, all with 
arity 0. 



datatype 1 
datatype N 
datatype P 
with 1 
datatype E 



with 1 



= ident ; 

= nullpat ; 

= pairpat of (P * P) 

< P, N < P ; 

= num I false I true I lamb of (P * E) 

I if of (E * E * E) I mlpair of (E * E) 
I apply of (E * E) I let of (P * E * E) 
I letrec of (P * E * E) 

< E, N < E ; 



Lastly, we conclude with a definition of CTL* formulae, see [15]. In this exam- 
ple, we consider two datatypes identifiers SF of state formulae and PF of path 
formulae, both with arity 1. 

datatype ’aSF=iof (’a* ’a SF) I conj of (’a SF * ’a SF) 

I not of ’a SF I f orsomefuture of ’a PF 
I forallfuture of ’a PF 

and ’a PF = conj of (’a PF * ’a PF) I not of ’a PF 

I nexttime of ’a PF I until of ’a PF 
with ’a SF < ’a PF ; 

CTL* and related temporal logics provide suitable frameworks in which to verify 
the correctness of programs and protocols, and hence are interesting calculi to 
formalize in proof assistants. 
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3 A core calculus A_>.,[],data 

In this section, we introduce the core calculus A^jj^data- The first subsection is 
devoted to types, datatypes and subtyping; the second subsection is devoted to 
expressions, reduction and typing. 

3.1 Types and subtyping 

Below we assume given some pairwise disjoint sets C of labels, V of datatype 
identifiers, C of constructor identifiers and X of type variables. Moreover, we 
let I, I' , k, . . ■ range over C, d, d' , . . . range over V, c, c' , Ci, . . . range over C and 
a, o! , ai, (3, ■ ■ ■ range over X. In addition, we assume that every datatype iden- 
tifier d has a fixed arity ar(c?) and that a\, a ^, ... is a fixed enumeration of X . 



Definition 1 (Types). The set T of types is given by the abstract syntax: 

(7,T := d[Ti,. . .,Tar(d)]\a\(7 ^ t\[Ii : ai, ... , : cr„] 

where in the last clause it is assumed that the Us are pairwise distinct. By con- 
vention, we identify record types that only differ in the order of their declarations, 
such as [I : a, I' : r] and [U : t,1 -.a]. 

We now turn to the definition of datatype. Informally, a datatype is a list of 
constructor declarations, i.e. of pairs (c, r) where c is a constructor identifier and 
r is a constructor type, i.e. a type of the form 

Pi — > ... ^ p„ — > d[ai, ... ,aar(d)\ 

with d € T>. However not all datatypes are valid. In order for a datatype to be 
valid, it must satisfy several properties. 

1. Constructors must be strictly positive, so that datatypes have a direct set- 

theoretic interpretation. For example, ci : nat d and C2 : (nat d) ^ d 

are strictly positive w.r.t. d, whereas C 3 : (d — > d) ^ d is not. 

2. Parameters must appear positively in the domains of constructor types, so 

that datatypes are monotonic in their parameters. For example, the pa- 
rameter a appears positively in the domain of a ^ d[a], while it appears 
negatively in the domain of (a — > nat) d[a]. 

3. Datatypes that mutually depend on each other must have the same number 
of parameters, for the sake of simplicity. 

4. Constructors must be strictly overloaded, so that case-expressions can be 
evaluated unambiguously. 

In addition, we allow datatypes to depend on previously defined datatypes. This 
leads us naturally to the notion of datatype context. Informally, a datatype con- 
text is a finite list of datatypes. Below we let a, r range over types, H range 
over datatype contexts, c range over datatype constructors and d, d' range over 
datatype identifiers. 
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Definition 2. 

1. a is a legal type in H with variables in {a\, . . . , ak} (or 9 if k = 0^, written 
H hfc (T type, is defined by the rules of Figure 1; 

2. a is a subtype of t in H, written H h ct < r, zs defined by the rules of Figure 
2, where H h d < d' if 

— ar{d) = ar{d') = m; 

— every declaration c\ t\ ^ ... ^ r„ ^ c?[ai, . . . , am] in H is matched 
by another declaration c : ri — > ... ^ r„ — > d'[ai, . . . , am] in 

3. T is a (^-constructor type in H, written H h r coty((i), is defined by the rules 
of Figure 3, where: 

— a appears positively in r, written a pos r, is defined as in [1 7]; 

— p is strictly positive w.r.t. d, written p spos d, is defined as in [17]; 

— d G'A if there exists a declaration (c : t) G'A in which d occurs; 

4- H is a legal datatype context, written H legal, is defined by the rules of Figure 
4, where H compatible D,c : t if 
~ for every (c : r') G D, H h r' coty(ci?) H h t' <t; 

— for every (c' : F) € D, H h r' coty(d') ^ ar(ci) = ar(d'). 

In addition, we say c : t is a main (i-declaration if it is the first declaration of 
the form c : t' with H h r' coty(ci). 

A special case of constructor type is given by the rule 

H ho Pi type V pi G {ai, . . . ,aar{d),d[ai, ... ,ak]} ^ ^ 

a ^ 

H h Pi ^ ... ^ pn^ d[ai, ... , Ofc] coty((i) 

Note that conditions 3 and 4 above are enforced by the side-conditions in (add- 
cons) whereas conditions 1 and 2 above are enforced by the rule (coty). Also 
note that in the side condition for (add-cons), F and r are compared w.r.t. H 
and not H; D. 



{-) 

(D) 

(data) 

(a) 



It hfc (7 type it hfc r type 
it hfc (T ^ r type 

it hfc Oi type (1 < i < n) 
it hfc [b : (71, ...,l„: CTn] type 

d G it it hfc (7i type (1 < i < ar(d)) 
it hfc d[cr] type 

it legal 



it hfc Oi type 



(1 < i < fc) 



Fig. 1. Type formation rules 
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(^refl) 


H hfc (T type 
H h (T < (T 




(^trans) 


HI-(T<r HI-t< 


P 


H h (T < p 




(<^) 


HI-cr^<(T Hl-r< 


t 

T 


H h a ^ T < a' ^ t' 




(<[]) 


M h (Ti < Ti (1 < * < n) M bfc Oj type 


(n -|- 1 < j < m) 


H l~ [/l . 0"l, ... , ln-\-m . *^ 71 + 771 ] ^ [^1 . 


"^1 ; ... 5 ^71 . "^Tl] 


(^data) 


^ ^ d<d' ^ ^ ai<n (1 < 


i < 3r{d)) 


H h d[a] < d'[r] 





Fig. 2. Subtyping rules 



(coty) 



Ffc pi type pi spos d Oj pos pi {l<i<n,l<j<k) 
H h pi ^ ... ^ pn ^ d[ai, . . . , Ofc] coty(d) 






Fig. 3. Constructor type rule 



3.2 Expressions and typing 

In this subsection, we conclude the definition of A^_[]_data by defining its expres- 
sions, specifying their computational behavior and providing them with a typing 
system. Below we assume given a set V of variables and let x, x' , Xi, y, ■ ■ ■ range 
over V. Moreover, we assume given a legal datatype context H and let Tq be the 
set of legal types in H; finally ct, r, . . . are assumed to range over Tq. 

Definition 3. The set £ of expressions is given by the abstract syntax: 

a,b := X \ Xx:t. a | a 6 | = oi, . . . ,l„ = a„] | a.l \ 

c[ct] a I casejj^] a of {ci 6i | ... | c„ ^ 6„} 



(empty) 



. legal 



(close) 



H; D legal 
legal 



(add-cons) 



H; D legal H h r coty(d) 
H; D, c : r legal 



H compatible D,c:t 



Fig. 4. Datatype rules 
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Free and bound variables, substitution := .} are defined the usual way. More- 
over we assume standard variable conventions [2] and identify record expres- 
sions which only differ in the order of their components, e.g. [I = a, I' = a'] 
and [1' = a' ,l = a]. All the constructions are the usual ones, except perhaps for 
case-expressions, which are typed so as to avoid failure of subject reduction, see 
e.g. [19], and are slightly different from the usual case expressions in that we 
pattern-match against constructors rather than against patterns. 

Definition 4 (Typing). 

1. A context F is a finite set of assumptions xi : ti, . . . , : r„ such that the 

XiS are pairwise distinct elements ofV and Ti G Tq. 

2. A judgment is a triple of the form F h a : r, where F is a context, a G S 
and T gTq. 

3. A judgment is derivable if it can he inferred from the rules of Figure 5, 

where in the (case) rule it is assumed that c\ : t\, . . . , c „ : r„ are the sole 
main d- declarations and that r®’ denotes ■fi ^ ^ £,n ^ o' whenever 

T = fi^ ... ^ ^ d[p] . 

4-. An expression a G £ is typable if F h a : a for some context F and type a. 



(start) 

(application) 

(abstraction) 

(record) 

(select) 

(constructor) 



F G X -.T 

F G e T ^ a F G e' \ t 
F G ee' :g 

F,x:t G e : a 



A X : T G F 



F G Xx:t. e : t ^ a 
F G Ci : Ti {1 < i < n) 



F G ]/l — ei , ... ,ln — Cn] . ]/l . Tl, ... , In ■ Tn] 

F G e : [h : Ti, . . . , L : r„] 

F G e.k : n 



A 1 < i < n 



F G hi : pi{a := r} (1 < i < fc) 



F G c[t] b : d[r] 



if c : pi ^ ... ^ pk ^ d[a] 



(case) 



F G a : d[p] F G bi : (ri{a := p})“^ (!<*<«■) 
F G case[)[^] a of {ci => 6i | ... | c„ => 6„} : a 



(subsumption) 



r h e : T 
F G e:a 



A t < a 



Fig. 5. Typing rules 
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The computational behavior of A^jj^data is drawn from the usual notion of 
/3-reduction, (.-reduction and 7r-reduction. 

Definition 5. 

1. ^-reduction is defined as the compatible closure of the rule 

{Xx:a. a) b a{x := b} 

2. TT-reduction is defined as the compatible closure of the rule 

[ll — • fin — ^n\di ^77 Ui 

3. L-reduction is defined as the compatible closure of the rule 

(ciM a) of {ci ^ /i I ... I c„ ^ /„} fi a 
^basic ^s defined as U ^77 U ^ 7 ,. 

<5. ->*basic and =basic arc respectively defined as the reflexive-transitive and the 
reflexive-symmetric-transitive closures of ^ basic- 

Note that we do not require r and t' to coincide in the definition of (-reduction 
as it would lead to too weak an equational theory. However, the typing rules will 
enforce t < t' on legal terms. 

4 Meta-theory of the core language 

In this section, we summarize some basic properties of the core language. 
Proposition 1 (Confiuence). basic is confluent: 

a — basic b 3c G S . a ^^basic C /\ b ^^basic C 

Proof. By the standard technique of Tait and Martin-L6f. 

Proposition 2 (Subject reduction). Typing is closed under basic: 

r \- a \ a A a —>basic b T \- b : a 

Proof. By induction on the structure of the derivations, using some basic prop- 
erties of subtyping. 

As usual, we say that an expression e is strongly normalizing with respect to a 
relation — > if there is no infinite sequence 

e ^ Cl ^ 62 ^ ... 



We let SN(— >) denote the set of expressions that are strongly normalizing with 
respect to 
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Proposition 3 (Strong normalization). basic is strongly normalizing on 
typahle expressions: 



P b a \ (7 a € SN( ^ basic) 

Proof. By a standard computability argument. 

We now turn to type-checking. One cannot rely on the existence of minimal types, 
as they may not exist (for minimal types to exist, one must require datatypes 
to be pre-regular, see e.g. [5,18]). Instead, we can define for every context P and 
expression a a finite set mi tip (a) of minimal types such that 

a e minr(a) P \~ a : a 

P a : a 3r € minr(a). t < a 

The set minr(a), which is defined in the obvious way, is finite because there are 
only finitely many declarations for each constructor. 

Proposition 4. Type-checking is decidable: there exists an algorithm to decide 
whether a given judgment P h a : a is derivable. 

Proof. Proceed in two steps: first compute minr(a), second check whether there 
exists r G mi tip (a) such that t < a. 



5 Extensionality 

5.1 Motivations 

Extensionality, as embodied e.g. in /^-conversion, is a basic feature of many type 
systems. Traditionally, extensionality equalities are oriented as contractive rules: 
e.g. /^-conversion is oriented as /^-reduction. On the other hand, expansive rules 
provide an alternative computational interpretation of extensionality equalities: 
e.g. /^-conversion may be oriented as /^-expansion. Expansive extensionality rules 
have numerous applications in categorical rewriting, unification and partial eval- 
uation. In addition to these traditional motivations, which are nicely summarized 
in [13], subtyping adds some new fundamental reasons to use expansive rules: 

1. contractive rules lead to non-confluent calculi, even on well-typed expres- 
sions: if we adopt /^-reduction for A-abstractions, then the following critical 
pair cannot be solved: 



Xx:t. {Xy.a. y) x 
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On the other hand, \x:t. {Xy-cr. y) x is well- typed (of type t ^ a) whenever 
T < a (this observation is due to Mitchell, Hoang and Howard [27]). A similar 
remark applies to datatypes: if we adopt ^-reduction for lists, as defined by 

case[|^*|][j e of {nil ^ nil[r] | cons ^ Aa:r. A^:list[r]. cons[r]a 1} e 

then the following critical pair cannot be solved: 



M 




nil[r] nil[CT] 

where M = case[[^*|{j (nil[(r]) of (nil ^ nil[r]|cons^ Aa:r. Aklist[r].cons[r]a I}. 
On the other hand, case[[^*|{j (nil[(j]) of (nil ^ nil[r] | cons ^ Aa:r. XI: 
list[rj. cons[r]a 1} is well-typed (of type list[r]) whenever a < t. 

2. contractive rules lead to calculi with too many canonical inhabitants (i.e. 
closed expressions in normal form): if we adopt /i-reduction for lists then 
the following expressions are canonical inhabitants of list[r], provided a < t, 
a : a and I : list[(r]: 

nil[(j] nil[r] cons[(j]a I cons[r]a I 

On the other hand, one would expect canonical inhabitants of list[r] to be 
of the form 

nil[r] cons[r]a I 

where in the second case I itself is a canonical inhabitant of list[r] and a is 
a canonical inhabitant of r. Remarkably we obtain the desired effect if we 
reverse /i-reduction. With this new reduction rule, which we call /i-expansion 
and denote by we have: 

nil[(r] — case[|^*|{j nil[(r] of (nil => nil[r] | cons ^ cons[r]} 
nil[r] 

Similarly, for a : a and I : list[(r], one has: 

cons[(r]a I — case[|^*|{j (cons[(j]a 1 ) of (nil nil[r] | cons ^ cons[r]} 
cons[r] a I 

(Strictly speaking, expansive extensionality rules are defined relative to a 
context and a type and the above reductions are performed at type list[rj); 

3. expansive rules provide a simple but useful program optimization: if we adopt 
expansive rules for records, the expression [n = 3, c = blue] reduces at type 
[n : nat] to [n = 3] , thus throwing out the irrelevant fields at type [n : nat] . 

We therefore embark upon studying an expansive interpretation of extensionality 

1^ X — *,[],data ■ 
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5.2 Expansive extensionality rules 

The computational behavior of the calculus is now obtained by aggregating the 
expansive extensionality rules to — >hasic- Expansive extensionality rules need to 
be formulated in a typed framework so we consider judgments of the form 

r h a ^ b : a 

For the sake of uniformity, we first reformulate -^basic in a typed framework. 

Definition 6. 

1. Typed 6aszc-reduction ^ basic is defined by the clause 

Thu ^basic b ■ O 

iff r h a : a and a ^basic b. 

2. ? 7 -expansion — is defined as the quasi- compatible closure (see below) of the 
rule 

r \- a — Xx:t. ax : t ^ a 
provided a Ax:r. b. The usual rule 

r \- a b : T ^ a T \- c : t 
r \- a c b c : a 

is only allowed under the proviso b \x\t. a x. 

3. Surjective pairing ^sp is defined as the quasi- compatible closure (see below) 
of the rule 

T \~ a >sp \li — ad\j ... — a.ln\ • [ll ■ t”!, ... fin • 

provided a [fi = ai , ... fin = a„]. The usual rule 

T \~ a >sp b '. [ll '. Ti^ ... fin • Xn) 

T b a.li i sp b.li '. Ti 

is only allowed under the proviso b [fi = a.l\, ... fin = a.ln]. 

4 . /i-expansion is defined as the quasi- compatible closure (see below) of the 
rule 

r 'r a ^p-case)j|^j a of {ci ^ ci[r] | ... | c„ ^ Cn[r]} : d[r] 

provided a cfirfb and a case^|^j a! of {ci ci[r] | ... | c„ c„[r]}. 
The usual rule 

r \- a — a' : d[r] 

r h caseJJj.^j a of {c b} — caseJJj.j.j a' of {c ^ b} : a 

is only allowed under the proviso a' case^|^j a of {ci ci[r] | ... | c„ 

c„[r]}. 
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5. Typed /ul^-reduction ^ /„// is defined as the union of basic, rj, sp, p-reduction, 

i.e. 

T F a ^ full b (J ^ T F O ^ basic, rJ,sp,JI b (J 

6- -^fuii and =/«// are respectively defined as the reflexive-transitive and the 
reflexive-symmetric-transitive closures of ^ full- 

Several points deserve attention: 

1. the various restrictions on ^sp and are required to enforce strong 
normalization. Without those restrictions, one would have loops or infinite 
reductions, see the appendix. 

2. unlike the traditional formulations of /^-expansion, we do allow /^-expansions 
on A-abstractions at type r — > ct if the type of the variable is not r. Such 
a possibility is indeed crucial for expressions of type a ^ t to reduce to 
an expression of the form Xx:a. e at that type. On the other hand, note 
that /^-expansion as defined here does not preserve ^hasic-normal forms. For 
example, for t < a, 

F Xx:a. x : t ^ a 

is in ^hasic-normal form but 

F Xx:a. x Xz:t. {Xx:a. x) z : t ^ a 
Xz'.T. Z 

A similar remark applies to records and case-expressions. 

3. — like rules for datatypes seem to have received very little attention in the 
literature. As far as we know, only Ghani [16] proposes a possible such rule 
(his rule is motivated by categorical considerations) but does not study it in 
detail. Our expansion rule for datatypes is weaker than the one suggested 
by Ghani [16] and thus is inadequate to capture the categorical view of 
datatypes as initial algebras in a suitable category. It nevertheless serves its 
purpose, see Proposition 7. 

4. reduction is not preserved under subsumption: that is, one may have 

T F a ^fuii b : a A F \/ a ^ full b : r 
for a < T. On the other hand, 

r \- a ^fuii b : a F \- a =/„// b : r 



for a < T. 

5.3 Preservation of confluence and strong normalization 

Expansive extensionality rules preserve the fundamental properties of A^_[]_data- 



Proposition 5 (Strong normalization). The relation -^fuii is strongly nor- 
malizing on typable expressions. 
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Proof. By modifying, along the lines of e.g. [20], the computability argument of 
Theorem 3. 



Proposition 6 (Confluence). The relation ^fuii is confluent on typable ex- 
pressions. 

Proof. Using Newman’s Lemma, strong normalization and weak confluence, which 
is proved by a case analysis on the possible critical pairs. 

5.4 Theory of canonical inhabitants 

Below we write P a : t if P h a : t and there is no & € £ such that 

P \- a -^fuU b : T. The following result shows that the theory of canonical 
inhabitants is well-behaved, i.e. that typable closed expressions in normal form 
have the expected shape. 

Proposition 7. Assume that P a : r. 

1. If T = a ^ p, then a = \x\a. b; 

2. If T = [h ■. ai, . . . ,ln ■ o-n], then a= [h = bi, ...,Z„ = 6„j. 

3. If T = d[cr], then a = c[cr]b. 

Proof. By a case analysis on the possible normal forms. 

The above result may be seen as evidence that the fj, sp, ^-rules restore a se- 
mantical justification of the system, and in particular of the case-expressions: 
as every canonical inhabitant of dfr] is of the form c[r]b, it is justified to do 
pattern-matching on c. 

6 Adding fixpoints 

^^,[],data has a very restricted computational power. In particular, it does not 
support recursion. In this section, we study an extension of A^jj^data with fix- 
points, and show the resulting calculus to be confluent. 

Definition 7. 

1. The set of expressions S is extended with the clause fix x:r.a. 

2. Fixpoint reduction ^rec is defined as the compatible closure of the rule 

fix X'.T.a ^rec Cl{x '.= fix X'.T.a} 

3. The typing system is extended with the rule: 

P,x -. T h a : r 
P h fix a; : r. a : r 

f. lUe let ^ fuii-\-rec denote ^fuii U 



rec • 
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We have: 

Proposition 8. The relation fuii+rec is confluent on typable expressions. 

Proof. Using a standard technique due to Levy [23], and exploited e.g. in [14]. 
The idea is to introduce bounded fixpoints, show that the calculus remains 
strongly normalizing and confluent, and then use some elementary reasoning 
on abstract reduction systems to conclude that — >/«//+rec is confluent. 

Obviously, — >/«//+rec is not strongly normalizing. In order to preserve strong 
normalization, one must restrict oneself to guarded fix-expressions. Technically, 
it is achieved by defining the notion of an expression e being guarded, and by 
adding the side-condition a is guarded in the typing rule for fixpoints. A precise 
description of the guard mechanism may be found for example in [17]. 

7 Conclusion and directions for further work 

In this paper, we have introduced a simply typed A-calculus with record types 
and parametric datatypes. The calculus supports a combination of record sub- 
typing and constructor subtyping and thus provides a flexible type system. We 
have shown the calculus to be well-behaved, in particular with respect to canon- 
ical inhabitants. 

In the future, we intend to study definitions for A^_[]_data and its extensions. 
Our goal is to aggregate a theory of definitions which is flexible enough to support 
overloaded definitions, such as multiplication *: 

= *2 : E ^ N ^ E 

= : O — > O — > O 

= *4 : N ^ N ^ N 

where each *i is defined using case-expressions and recursion. As suggested by 
the above example, the idea is to allow identifiers to stand for several functions 
that have a different type. To do so, several options exist: for example, one 
may require the definitions to be coherent in a certain sense. Alternately, one 
may exploit some strategy, see e.g. [10,21], to disambiguate the definitions. Both 
approaches deserve further study. 

Furthermore, we intend to scale up the results of this paper to more complex 
type systems. 

1. Type systems for programming languages: in line with recent work on the 
design of higher-order typed (HOT) languages, one may envisage extending 
^^,[],data with further constructs, including bounded quantification [9], ob- 
jects [1], bounded operator abstraction [11]. We are also interested in scaling 
up our results to programming languages with dependent types such as DML 
[33] . The DML type system is based on constraints, and hence it seems pos- 
sible to consider constructor subtyping on inductive families, as for example 




Constructor Subtyping 



125 



in X z < X j if z < j where X z is the type {0, . . . , z}. Extending constructor 
subtyping to inductive families is particularly interesting to implement type 
systems with subtyping. 

2. Type systems for proof assistants: the addition of subtyping to proof assis- 
tants has been a major motivation for this work. Our next step is to inves- 
tigate an extension of the Calculus of Inductive/Coinductive Constructions, 
see e.g. [17], with constructor subtyping. As suggested in [5,12], such a cal- 
culus seems particularly appropriate to formalize Kahn’s natural semantics 
[ 22 ]. 

In yet a different direction, it may be interesting to study destructor suhtyping, 
a dual to constructor subtyping, in which an inductive type ct is a subtype of 
another inductive type r if ct has more destructors than r. The primary example 
of destructor subtyping is of course record subtyping, as found in this paper. We 
leave for future work the study of destructor subtyping and of its interaction 
with constructor subtyping. 
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Loops and infinite reductions for unrestricted 
extensionality rules 

For ? 7 -expansion: 

r \- a c (Xx:t. ax) c : t ^ a 
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For surjective pairing (we treat the case where a \ [I \ t, 
remark applies to arbitrary records): 

r h a.l ^sp [I = CL.I, I' = a.l'].l : r 
a.l 

For /i-expansion (if we allow constructors to be expanded): 



r I- (Ci[r] b) ^jrcase)j|^j (ci[r] b) of {ci ^ ci[r] | ... | 

Ci[T] b 

and (if we allow case-expressions to be expanded): 

r F case)j|^j a of {ci ^ ci[r] | ... | c„ ^ Cn[r]} 

case)jH oi of {ci ^ ci[t] | ... 
case)j[^j 02 of {ci ^ ci[r] | ... 



where uq = a and 



Oi+i = case)^|^j Ui of {ci ^ ci[r] | ... | c„ ^ 

and (if we take the compatible closure of Jl) : 

r \- a case)^H a of {ci ^ ci [t] | ... | c„ ^ p 
case)jr oi of {ci ^ ci [r] | ... | c„ ^ ( 
^p- case)^|^j 02 of {ci ^ ci [r] | ... | c„ ^ ( 

■ ■ • 



where oq = o and 



: a] but a similar 



c„ ^ c„[r]} : d[T] 



■ d[T] 

I Cn ^ ["^] } 

I ^ Cn ['^] } 



CriMI 



Ml : d[T] 

«M} 

«M} 



Oi+1 = case)j|^j Oi of {ci ^ ci[t] | ... | c„ ^ CnMI 
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Abstract. Safety of interoperation of program fragments written in different safe 
languages may fail when the languages have different systems of computational 
effects: an exception raised by an ML function may have no valid semantic in- 
terpretation in the context of a Safe-C caller. Sandboxing costs performance and 
still may violate the semantics if effects are not taken into account. We show that 
effect annotations alone are insufficient to guarantee safety, and we present a type 
system with bounded effect polymorphism designed to verify the compatibility 
of abstract resources required by the computational models of the interoperating 
languages. The type system ensures single address space interoperability of stat- 
ically typed languages with effect mechanisms built of modules for control and 
state. It is shown sound for safety with respect to the semantics of a language 
with constructs for selection, simulation, and blocking of resources, targeted as 
an intermediate language for optimization of resource handling. 



1 Introduction 

Component-based software development promises the freedom to choose the most suit- 
able language independently for each fragment in a system, as long as the language 
implementation supports a common interface [12,15]. The existing interfaces offer a 
trade-off between safety and efficiency of interlanguage communication. The commu- 
nicating programs may reside in separate address spaces, delegating the responsibil- 
ity for correctness of their interaction to the operating system, which imposes severe 
performance penalties even for components using the same language implementation. 
Alternatively the caller and callee may share the same address space; this provides for 
fast interaction but typically fails to prevent possible errors due to inconsistency of the 
computational models. With the increasing dependence on component libraries the ef- 
ficiency of the interoperation mechanism is becoming a significant factor; on the other 
hand the use of third-party components, especially in the context of dynamic linking, 
requires strong safety guarantees. 

* This research was sponsored in part by the DARPA ITO under the title “Software Evolution using 
HOT Language Technology,” DARPA Order No. D888, issued under Contract No. F30602-96- 
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The views and conclusions contained in this document are those of the authors and should not 
be interpreted as representing the official policies, either expressed or implied, of the Defense 
Advanced Research Projects Agency or the U.S. Government. 

S.D. Swierstra (Ed.): ESOP/ETAPS’99, LNCS 1576, pp. 128-146, 2002. 
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Interoperation between fragments with different computational models also occurs 
when a higher- level language is being used for systems programming. Demands for inde- 
pendence of some language features (e.g. garbage collection [ 1 1]) on parts of the system 
can be satisfied by compiling them using a different model for a subset of the language, 
and using an interlanguage protocol to communicate with the rest of the system. This 
approach can also be used when compiling finer-grained program fragments written in 
the same language, to achieve pay-as-you-go efficiency of language features using the 
most cost-effective model which supports the features employed by the particular frag- 
ment. On a higher level, different source languages may have a common representation 
in a typed intermediate language [16] and different but interoperating implementation 
schemes, tuned to specific source language characteristics. Safety and efficiency of in- 
teroperation are again definite but competing requirements in these situations. 

To provide the basis for a safe and efficient interlanguage operation, in this paper 
we describe a novel type-based technique, supporting principled interoperation among 
languages with different features selected among mutable store, exceptions, first-class 
continuation, and heap and stack allocation of activation records. Our framework allows 
programs written in multiple languages with overlapping features to interact with each 
other safely and reliably, yet without restricting the expressiveness of each language. 
Thus our goal is designing an interoperability scheme which is 

- safe: it should not be possible to violate the runtime safety of a language by calling a 
foreign function, even if this function is defined in a language with different features; 
and 

- efficient: a language implementation should not be forced to use suboptimal methods 
for its own features in order to provide support for other languages’ features. For 
instance the implementation of a language that does not have exceptions should not 
have to know about the exception handling mechanism(s) used in interoperating 
implementations of other languages. 

Ideally we would like to have complete interoperability, allowing us to invoke any 
function from any term written in another language as long as the semantics of this 
invocation is defined in the “union” of the languages. However this requirement poses 
serious efficiency problems, since to satisfy it, the implementation of each language 
should be aware of the supporting mechanisms for all features in the union language. 
Thus, for instance, if a Scheme implementation S employs heap-based allocation of 
activation records, an implementation of Safe-C which may have to interoperate with S 
cannot use a stack; or, conversely, the Scheme implementations will be forced to use an 
allocation strategy compatible with a stack [1,5]. 

At the other extreme are interfaces like COM [15] which impose few restrictions 
on language implementations. Safety is to be ensured via sandboxing, using separate 
address spaces. The interoperation mechanism supports only basic language features, 
e.g. function invocation and passing of arguments and result. In cases when the cost 
of cross-domain calls is acceptable this appears as a reasonable solution, but in fact 
depending on design choices in the implementations it may not provide the expected 
semantics. 
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ML: 




Java: 


exception E 




class J { 


fun callback () =... 


raise E ... 


public static 


fun MLMain () = 




void f (int i, Callback c) 






throws Exception { 


J.f (0, callback) 




if (i == 1) throw new ExceptionO ; 


handle E => J.f 


(1 , callback) 


try { c . invoke () ; } 






catch (Exception e) { ... J 

} 

} 



Fig. 1. Failure of simple sandboxing to preserve semanties 



Consider the sehematie example shown in Figure 1, where raising exception E in 
callback shortcuts the flow through the Java fragment. If the Java implementation 
maintains information about the last entered try (i.e. the current exception handler) 
which is context-switched upon calls to ML, this information will be incorrect (with 
respect to the “union” semantics) when Exception is thrown after the second call to 
J.f. 

Thus as observed in [14] the function of a mechanism for safe interlanguage calls is 
more than marshalling values between representations - it must also take into account 
the effect systems of the languages. 

The contribution of this paper is that it formalizes the notion of safe interoperabil- 
ity between statically typed languages by building on previous work on effect sys- 
tems [8,18,19] and introducing a type system which relates effects to the machine 
resources they require. Since different models of languages provide different sets of 
resources, and the same effect may be possible with various sets of resources, both an 
effect and a resource annotation are needed: the resource annotation indicates the spe- 
cific requirements of a code fragment, while the effect annotation determines whether 
an alternative set of resources can be coerced to match these requirements. 

Tracking the effects of the parameters of higher-order functions is achieved by effect 
polymorphism: using effect variables in the types to express the dependencies. More 
specifically our system has bounded effect polymorphism to reject effect applications 
when the effect arguments are unsupported by the resource bounds. We omit type poly- 
morphism from the present description for brevity; we believe it is largely orthogonal to 
the treatment of resources and effects in types. 

Furthermore, to show soundness of our type system, we introduce a typed language 
with constructs for explicit management of machine resources. Using this language as 
intermediate in compiling various source languages allows the compiler to optimize 
interlanguage calls by “floating” the boundary between contexts with different resource 
requirements. Thus parts of a program written in one language can be specialized to 
operate with the resources provided by the implementation of another language [17]. 

The result is that our system avoids the safety traps and allows for the interoperation 
of efficient implementations by restricting, in some cases, which foreign functions may 
be invoked, and imposing conditions the caller must satisfy before the invocation. To 
determine whether a call is possible, we consider the effects of a function, i.e. its use 
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Fig. 2. Language interoperation: conditions for safety and efficiency 



of resources. A resource has to be saved (blocked) when it is not required by the callee. 
If a resource which the caller does not have is needed by the called function, but the 
latter does not produce an effect depending on this resource, a dummy resource can be 
provided instead. In some cases it is possible to switch to alternative resources supporting 
the same effect. An example illustrating these points is shown in Figure 2 where three 
languages Li, L 2 , and L 3 are built out of the semantic modules studied in this paper. 
For example, calling an L 3 function from an L 2 program is always possible, because the 
functionality of all resources of L 3 is supported in L 2 , but the calling convention must 
be switched from heap to stack based, and the L 2 exception handler must be preserved. 

To specify formally and prove the safety of our system, in Section 2 we introduce 
the typed intermediate language TZ and describe its static and dynamic semantics in 
Section 3. In Section 3.3 we show that the type system of TZ guarantees the runtime 
safety of type-correct programs. This makes possible the safe linking of separately 
compiled components which can be shown to have a type-correct translation into TZ 
with the corresponding interface type. 

2 A Language with Machine Resource Control 

The language TZ is an idealized version of the typed intermediate language of our system. 
The novel feature in it are the abstract resources, which together with their associated 
primitive operations can be seen as modules of which we can build sublanguages of 
TZ', indeed this is an implied goal of our interoperability scheme. Both the static and 
the dynamic semantics of TZ permit a presentation in which new functional blocks are 
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Abstract Resources 
continuation 



ContRes B a 


;= 


S 1 H 


continuation stack and heap 


control 








CtrlRes B c 


;= 


a 1 X 


. . . plus an exception handler 


primitive 








PrimRes B r 


:= 


c 1 M 


. . . plus a mutable store 


Resource Descriptors 








P 


€ 


ResourceDesc = CtrlRes x 


Effects 








u 


€ 


EffVar 




PrimEff B f 




callcc 1 exception | 


store 


e 


€ 


Effects = 


Types 








Typ B T 


:= 


r — T 1 cont'’[r] 


Vu<p. r 




1 


exn 1 ref[r] | unit 


b, 6 G BasicTyp 


Values and Terms 








Val B V 


:= 


X 1 d 


X G Var, d G Const B {*} 




1 


x:t. e 


resource-specific abstractions 




1 


Au<p. V 


bounded effect abstractions 




1 


x[e] 


effect applications 


Exp B e : 


:= 


@x x' 


applications 




1 


use (p) e 


resource control 




1 


[tj 


values 




1 


let X : r <— e in e' 


bindings 




1 


ref X 1 ! X | x:—x 


store 




1 


callcc X 1 throw[r] 


X x' continuations 




1 


e handle x : exn. e' 


1 raise[r] x exceptions 



Fig. 3. Syntax of TZ, a language with typed resource control 



added to a basic language without interference with other blocks; the exception is the 
exceptions block, whose semantics needs support from both first-class continuations 
(when present) and the resource handling itself We take the approach of presenting all 
blocks in one step mainly due to space constraints. 

The abstract resources, ranged over by r (see Figure 3), are structured in a hierar- 
chy including the control resources c and their subdivision, the continuation allocation 
resources a. The language supports two allocation strategies for activation records: a 
stack-based discipline with abstract resource S, and a heap-based, with abstract resource 
H . An additional control resource is the exception handler X. Informally all of the control 
resources can be viewed as structures of frames such as activation records. A primitive 
resource not related to the control is the store M ; the system can be directly extended 
with multiple versions of the store which can be controlled separately. 

Primitive resources are the building blocks of the resource descriptors p, consisting 
of two components. The first component specifies the “calling convention” in use. i.e. 
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how values are communicated to and from a term; to keep the system simple we only 
consider conventions on returning the result, with a choice between a stack-allocated 
and a heap-allocated continuation. The second component describes the set of resources 
available or required for the evaluation of a term. 

The counterpart of resources are the effects e which are sets of primitive effects 
(informally caused by the corresponding primitive operations in the language) and effect 
variables which stand for sets of effects. In our language the primitive effects are callcc, 
exception, and store. A computation may only introduce effects which are provided for 
by the available resources, e.g. the effect exception can only occur when the exception 
handler resource X is available. 

There are only minimal requirements for resources needed to produce an effect; 
extra resources can simply be ignored. This intuitive observation leads to the definition 
of a relation of compatibility between resource descriptors: letting rs range over sets 
of primitive resources, (a, rs) C (a, rs') if rs C rs'. Note that compatible resource 
descriptors denote the same calling convention. 

The types t include function types ti T2 annotated with a resource descriptor p 
and effects e. This notation describes a function whose evaluation requires the calling 
convention and resources denoted by p, and produces the effects e. Similarly the re- 
source annotation p on the type of continuations cont'’[r] denotes the resources needed 
to re-activate the continuation; the type system assumes that the effect of invoking a 
continuation is the maximal possible under p. 

Among the types are also the bounded-effect quantified types Vm < p. r. An effect 
application x\e\ of a variable x of this type is only valid when the effects e are possible 
with the resources described in p. 

In addition to effect applications the values v oiTZ include bounded effect abstrac- 
tions, and abstractions annotated with resource descriptors with the meaning noted for 
function types above. 

A term e which requires the resources described in p and produces effects £ can be 
visualized as the element shown in Figure 4(a), where the S indicates the convention for 
the result is to use a stack. The term [uj denotes an effect-free computation returning 
the value v according to a convention determined by the context. The computation of 
let a; : T t— e in e' merges the effects of the computations of e and e' with x bound to 
the value of e (Figure 4(b)). 



resources 




effects and 
resources 



result 



(a) A term requiring resources p 
and producing effects e 




(b) Evaluation of let a; : t t— e in e' 
with resources p 



Fig. 4. Terms and their composition 
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(a) restricting 




(c) switching calling conventions 



the set of resources 

Fig. 5. Operations on resources performed by use (p) e 




A novelty is the resource-management construct use (p) e, which evaluates e in the 
context of resources described by p, replacing the current resources. For instance, it 
is possible to reduce the set of available resources p' when a term e expects a subset 
p (as reflected in its type, see Section 3 for details). As alluded to in Figure 5(a), the 
resources currently available but not needed are preserved during the evaluation of e 
and restored upon completion. For example, consider the case of an ML program with 
resources (H,{H,X,M}) invoking a Scheme function f with an integer argument x and 
integer result. Assuming both implementations perform heap allocation of activation 
records, the call would require preserving the ML exception handler. Translated to TZ, 
the invocation is expressed by 



use((H,{H,M})) (@fx) 

in a type environment including x : Int, f : Int |nt (for some effect e). 

It is also useful to allow the creation of a new dummy resource r when it is required by 
a term e but the effects of e make no use of r. Consider a Scheme fragment with resources 
(H, {H, M}) invoking a compiled from ML function g of type Int ^jstore}^ Int. This 
type shows that g has no effects that require the exception handler, but the code for g 
nevertheless expects an exception handler resource, which is reflected in the resource 
component of g’s type. Therefore the invocation must be enclosed in a use, constructing 
a dummy X resource: 

use((H,{H,X,M))(@g5) 

This situation is represented graphically in Figure 5(b) where the effects e: are supported 
by the resources p', but the term e requires additional resources. 

In the absence of continuation capture effects the creation of new stack and heap 
resources has different semantics due to the localization of allocation effects on these 
resources. While a newly created store or exception handler resource cannot be used at all 
(since any use would create an effect which requires that resource, hence the term would 
not have a type in an environment which does not provide the resource), a new stack or 
heap may be used for the allocation of activation records, because the allocation effect 
is localized and will not be propagated when the evaluation of the term is completed. 
Note that this only applies to the use of the heap for stack-like management of activation 
records; use of callcc for instance introduces an effect which makes it impossible to 
type the term in an environment which does not have a heap resource. 
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let id : T T [xJJ 

in let wrap : Vu < Both, (r — r) — >0 r — >u t 
■t— [Au < Both. 

A®f:r— >-uT. x:t. 

use (Either) 

use (A) (@f x)JJ 

in let throw42 : r — >fcaiicc} 

■4— [A® k:r. let id_H : (r — >0 t) <— @(wrap[0])id 
in let k’ : T 4— @ id_H k 
in let X : Int 4— [42J 
in throw[lnt] k' xj 



where 

T 



A 

B 

Either 

Both 



in callcc throw42 



cont® [Int] 

(S,{S,M}) 

{H,{H,M}) 

(H,{S,H,M}) 

(S,{M}) 



Fig. 6. Example of handling of foreign objects 



Another application of the construct use (p) e is the selection of calling convention. 
Figure 5(c) illustrates this in the case of switching from stack-based to heap-based 
continuations for the evaluation of e when p = (H, rs) and the resource descriptor of the 
context is p' = (S, rs), where {H, S} C rs (i.e. both a heap and a stack are provided). 
Formal conditions for validity of use (p) e are presented in Section 3. 



The example in Figure 6 shows an application of use in the case of interoperation 
between programs written in two languages. The function id uses stack allocation (S), 
while throw42 uses heap allocation (H). Both languages also have the store resource M. 
Before id can be called from throw42, it must be coerced to heap allocation, which is 
performed by the effect-polymorphic function wrap. Note that the effects of the argu- 
ments of wrap are restricted by the resources in the intersection of the two languages’ 
sets, denoted by Both', in this case this means only store effects are allowed. The invo- 
cation of wrap’s argument f is enclosed in two use regions: the outer use creates a new 
stack, while the inner one switches the calling convention to the stack and saves the heap 
resource. 

The example also shows how an object which only has meaning in a language with a 
given feature can be handled “passively” by code written in a language without support 
for that feature. Note that the first-class continuation captured in the heap-allocating 
program is passed to id and back, and then activated. 
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Effect Environment Formation 


Type Environment Formation 


(Env-eff-empty) 

\~A 0 


(Env-eff-ext) 
hA A 


(Env-typ-empty) 

hA A 


hA At, t < p 


Z\ hr' 0 


Effects 




(Env-typ-ext) 


(Eff-empty) 


(Eff-union) 


^ \~T r Z\ hr T 


\~A A 


A he e' < p A he e" < p 


A hr Fr, X : T 


A Fg 0 < p 


A he e' yj s' < p 




(Eff-var) 

\~A A u G Dom (A) 


(Eff-primitive) 

hA A p £ Required (/) 


Values 

(Val-const) 

Z\ Ft r 


A \~e u < A{u) 


^ Fe {/} < P 


A-,F F„ d : 6(d) 


(Eff-add-resource) 

A \~ee < p p^ p 
A \~s£ < P 
Types 




(Val-var) 

A hr F X £ Dom (F) 




A;F hr X ■. F(x) 


(Typ-basic) 

\~A A 


(Typ-fun) 

Zl Fe e < p Z\ Ft T, 


(Val-abs) 

A hr F A hr T 
p\A-,Fr, x : T Fe e : r';e 


A hr b 


/\ J- — 'f^ 






A;F hr x:t. e : T —^e r' 


(Typ-unit) 


(Typ-poly) 


(Val-eff-abs) 


hA A 


hA A Au, U < p hr T 


A hr F Au, u < p-, F hr V : T 


A hr unit 


A Ft Vm < p. r 


A-,r hr Au <p.V\\/u<p.T 


(Typ-ref) 

A hr T 


(Typ-cont) 

Zi Ft r 0 he {callcc} < p 


(Val-eff-app) 

r(x) = \/u<p.T A he e < p 


A hr ref[r] 


A hr cont^[r] 


A\ r hr x[£] : [e/u]r 



Fig. 7. The TZ type system: effects, types and values 



3 Semantics of IZ 

3.1 Static Semantics 

The type system of 7^, shown in Figures 7 and 8, keeps track of the resources necessary for 
the evaluation of a term and makes a conservative estimate of the effects of the evaluation. 
The effect environment A specifies the resource bounds of free effect variables, and as 
usual the type environment F assigns types to free variables. 

The rules for sequents A hg e < p reflect the dependence of effects on resources and 
form the basis of bounded effect polymorphism. The dependence of primitive effects 
on resources is captured by the function Required (Figure 9) specifying the alternatives 
for minimal resource descriptors enabling a primitive effect. Note that the exception 
and store effects work with either stack or heap continuation allocation, while the callcc 
effect can only be introduced with heap allocation. 
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(Exp-app) 

A \~T r r(x) = t' T r(x') = t' 

p’, A-, r \-f,@xx':T',e 

(Exp-val) 

A’, r \~v V : T 

p; A- r he [wj : t; 0 
(Exp-callcc) 

A \~T r r{x) — cont'’[r] t 
p; A] r he callcc a; : r; e U {callcc} 

(Exp-ref) 

A \-T r r(x) = T A \~s {store} < p 



(Exp-use) 

p; Z\; r he e : t; e A \~e e < p 
p-,A-,r he use(p)e : r;e 

(Exp-let) 

p-,A-r he e : r; e 
p; A- r^, X ■. T he e' : r'; e' 
p-,A-,r he let a; : r e in e : t'; e U e' 
(Exp-throw) 

A \~r r Zi hr T 
r{x) = cont'^fr'] r(x') = t' 
p; Z\; r he throw[r] x x ■. MaxEff (p) 
where MaxEff (p) = { / | 0 b {/} < p} 

(Exp-deref) 

A \~r r r(x) = ref[r] A hg {store} < p 



p; Z\; -T he ref x : ref[r]; {store} p\A-,r he ! a; : r; {store} 



(Exp-update) 

Zi hr -T E{x) — ref[r] r{x ) = t Z\ he {store} < p 
p; A',r he a: x' : unit; {store} 

(Exp- raise) 

A \~T E r(x) = exn A hg {exception} < p 
p; A; r he raise[r] x : r; {exception} 

(Exp-handle) 
p-,A-r he e : r; e 

p-, A’, r^,x : exn he e' : r; e' A hg {exception} < p 
p; Z\; _T he e handle x : exn. e : r; e U U {exception} 



Fig. 8. The TZ type system: terms 



Type judgments for values associate a type r with a value v and a pair of envi- 
ronments; values have no effects and therefore their computation requires no resources. 
Sequents for terms have the form p;A;r hg e : t; e, where p is the resource descriptor of 
the environment, r is the type of e, and £ represents the effects of the evaluation of e. For 
the typing of constants we assume the existence of a function 9 £ Const BasicTyp, 
and in particular that 6{*) — unit. 



Required (callcc) = {(H,{H})} 

Required (exception) = {(S, {S,X}), (H,{H,X})} 
Required (store) = {(S, {S, M}), (H, {H, M})} 



Fig. 9. Minimal resource requirements for primitive effects 
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The function MaxEff yields the maximal effect possible with the resources in p; it 
is used in making a conservative approximation of the effects of a continuation given 
the resources available at the point of its capture (in rule (Exp-throw)). 

3.2 Dynamic Semantics 

To separate the static and dynamic aspects of the use of resource descriptors we consider 
a variant of TZ with explicit resource annotations, where the abstract syntax for terms is 

e ::= @x x' \ use^ (p') e 

I [wj'’ I let'’ a: : T t— e in e' 

I ref'’ X \ I'’ a; I a; ;='’ x' 

I callcc'’ X I throw'’ [t] x x' 

I e handle'’ a; : exn. e' | raise'’ [r] a: 

The translation of type-correct TZ terms into the annotated language is straightforward 
since the new annotations correspond to the resource descriptors p on the left of hg in 
the typing sequents for those terms. 

We present operational semantics of TZ following [3] in terms of a variant of the 
tail-call-safe CaEK machine. We prefer operational semantics because it allows us to 
show directly that reasonably efficient implementations exist. The original machine 
allocates an activation frame on the continuation stack when entering a let -binding (but 
not when entering a closure), and pops the frame to complete the binding. We extend 
the machine by including a component denoting additional machine resources, and by 
providing a heap-based alternative continuation allocation strategy supporting first-class 
continuations. 

The transitions of the abstract machine are specified as a relation on machine con- 
figurations consisting of the term being evaluated, its environment, and the currently 
available machine resources; for the purposes of the proof of soundness of the type 
system we include in the configuration also the type of the evaluated term as well as 
an accumulator of effects. Having the annotations in the syntax makes it clear that the 
although the semantics of some constructs depend on what resources are available, this 
dependence can be resolved at compile time. 

The semantic domains defining the meaning of the components of the abstract ma- 
chine are listed in Figure 10. As usual the environment E maps variables to values 
converted to their internal representation as shown in Figure 11. Values are represented 
in the environment as closures with environments binding their free variables; no en- 
vironment for effect variables is needed because effect instantiation is performed via 
substitution (Figure 11). 

To assist with the proof of soundness of the type system we instrument the operational 
semantics to keep track of the current control resource, the accumulated effects of the 
evaluation, and the type of the term being evaluated. Further, the environment is extended 
to also record the type of each variable; type safety (Lemma 1) implies that all type 
annotations and tags only used for verification in the semantics may be erased without 
affecting the outcome of the evaluation of a correctly typed term. We use the shorthands 
^E{x) and '^E{x) for the value and type components of E{x), respectively, and we write 
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j(x,E) = '^E{x) 

7 (xfe], -E) =7 {\e/u]v, E') \f^E(x) = Closure {Au<p. v, E'), and 0 \~e e < p, 
7 (v, E) = Closure (v, E) for other v 



Fig. 11. Representation of values 



1-7^ (w : t)] to denote the extension of E whieh assigns maehine value w and type 
T to X. 

The modularity of the language bloeks is supported in our framework by the repre- 
sentation of maehine resourees Rs as a reeord (tuple) with a tag tag (Rs) G Resources 
whieh yields the set of the eotresponding primitive resourees. The set of possible values 
of an individual maehine resouree eotresponding to abstraet resouree r is denoted by 
MR (r) . Thus, assuming a eanonieal enumeration of resourees, a tuple of resourees with 
tag rs is an element of the singleton separated sum X)rs'e{rs} OrGrs' 
denote this set by (rs). The semanties make use of two families of total frmetions, 

^ MR (r) X MR^ (rs — {r}) — >• MR'^ (rsU{r}) 

P^^^rnj{r} ^ MR'^ (rsU{r}) — >• MR (r) x MR'^ (rs — {r}), 

indexed by a primitive abstraet resouree r and a tag rs, with the intention that 

maps (R, Rs) to the reeord eontaining R and all of Rs, and is its inverse. We 

also use single).,, = tti oproj).^ and drop),, = tt 2 oproj),,. The frmetions inji),, and inj)i,, 
(and similarly proj) are eommutative for r ^ r' , {r, r'} (T rs = 0; this eommutativity 
allows the generalization of inj and proj to the frmetions 

G MR'^{rs')x MR'^{rs — rs') — >■ MR'^(rsUrs') 
projjjljrgi G ME^(rsUrs') — >■ MR'^{rs')x MR'^{rs — rs'). 

Using these frmetions we ean define natural liftings of operations linear in an individ- 
ual resouree R (i.e. in whose type R oeeurs exaetly onee positively in both the types of 
domain and eodomain, or does not oeeur in the domain type and the eodomain is linear in 
R) to operations on a reeord of resourees Rs eontaining R: e.g. for G S' x MR (r) -7 
S' X MR (r) we define (g)S x O^Gra (^) S' x (r) by 

frs (s, Rs) = let {R, Rs') = proj):,. (Rs), 

{s',R') = r{s,R) 
in (s', (S', Rs')) 

where r G rs. With the generalized versions of inj and proj this lifting extends to 
frmetions on reeords of resourees as well. 

For eaeh primitive resouree r there is an initial element empty'’ of the algebra of the 
eotresponding maehine resouree; these elements are used in the simulation of resourees 
(rule (add) in Figure 13). 

A generie eontinuation maehine resouree C'' (eotresponding to abstraet eontrol re- 
souree c) is deseribed by the four staek operations empty", isEmpty", newFrame", and 
topFrame", where topFrame" is the inverse of newFrame" on non-empty C". We will 
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jix,E) = ^E{x) 

7 (a;[e], -B) =7 {[e/u]v, E') if^E{x) = Closure {Au<p. v, E'), and 0 \~e e < p, 
7 {v, E) = Closure (v, E) for other v 



Fig. 11. Representation of values 



E\x ^ {w : r)] to denote the extension of E whieh assigns maehine value w and type 
T to X. 

The modularity of the language bloeks is supported in our framework by the repre- 
sentation of maehine resourees Rs as a reeord (tuple) with a tag tag (Rs) € Resources 
whieh yields the set of the corresponding primitive resources. The set of possible values 
of an individual machine resource corresponding to abstract resource r is denoted by 
MR (r). Thus, assuming a canonical enumeration of resources, a tuple of resources with 
tag rs is an element of the singleton separated sum X)rs'e{rs} OrGrs' 
denote this set by MR'^ (rs). The semantics make use of two families of total functions, 

*^Jrs-{r} ^ MR{r)xMR'^ {rs — {r}) — >• MR'^ {rsU{r}) 
proj^^^y G MR'^ {rsU{r}) — >■ MR{r)x MR'^ {rs — {r}), 

indexed by a primitive abstract resource r and a tag rs, with the intention that inj 
maps {R, Rs) to the record containing R and all of Rs, and proj^^^J^^y is its inverse. We 

also use single).^ = tti oproj^^ and drop).,. = 7T2 oproj)^. The functions and 
(and similarly proj) are commutative for r ^ r' , {r, r'} (T rs = 0; this commutativity 
allows the generalization of inj and proj to the functions 

€ MR'^{rs')x MR'^{rs — rs') — >■ MR’^ {rsUrs') 

P^Jrljrs' € MR'^{rsUrs') — >■ MR'^{rs')x MR'^{rs — rs'). 

Using these functions we can define natural liftings of operations linear in an individ- 
ual resource R (i.e. in whose type R occurs exactly once positively in both the types of 
domain and codomain, or does not occur in the domain type and the codomain is linear in 
R) to operations on a record of resources Rs containing R: e.g. for G S' x MR (r) — ^ 
S' X MR (r) we define fj) (g)S x OrGrs ('’) S' x O^Grs (r) by 

frs (S: Rs) = let {R, Rs') = proj):,. {Rs), 

{s',R') = r{s,R) 
in {s',inj)„_^^y {R',Rs')) 

where r G rs. With the generalized versions of inj and proj this lifting extends to 
functions on records of resources as well. 

For each primitive resource r there is an initial element empty'' of the algebra of the 
corresponding machine resource; these elements are used in the simulation of resources 
(rule (add) in Figure 13). 

A generic continuation machine resource C" (corresponding to abstract control re- 
source c) is described by the four stack operations empty", isEmpty", newFrame", and 
topFrame", where topFrame" is the inverse of newFrame" on non-empty C". We will 
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use the more concise notation F Rs both for newFrame^^ {F, Rs) and as a pattern 
standing for Rs' when c G rs, isEmpty^^ {Rs') = false, and topFrame^^ {Rs') = 
{F, Rs). A section {F ) stands for the function mapping Rs to F Rs. 

The stack-based implementation only provides the minimal functionality; the heap- 
based implementation can be used for non-sequential access as well. Exception handling 
uses a separate continuation stack. 

Activation frames include a variant of the standard binding frames [3] to record the 
continuation completing a let -binding x : t' ^ e' in e after evaluating e'. 

The binding frame is of the form Bind x:t. e, E, t'), where t' is the type of 

e in environment ^ extended with x : t, rs is the set of resources available for the 
evaluation of the let, and a is the allocation resource for the evaluation of e. 

In addition there are two kinds of resource management continuation frames. A 
Restore frame indicates that a machine resource has been saved and removed from the 
current; a Drop frame signals that a dummy resource has been created and provided to 
code which only uses the it locally, or not at all (but has been compiled to expect it). The 
names of these frames suggest the operations that must be done to restore the status after 
the evaluation of the current term has completed, i.e. when the frame is encountered 
during unwinding the continuation. 

The relation of computation on configurations is the union of the transition rules 
shown in Figures 12 and 13); the relation of computation is the reflexive transitive closure 
of H^i. The transition rules are grouped in classes based on the feature they implement 
as follows. 



Building block 


Rules 


Figure 


environment 


(app), (let), (bind) 


12 


store 


(ref), (deref), (update) 


12 


first-class continuations 


(callcc), (throw) 


13 


exceptions 


(handle), (raise) 


13 


(add), (remove), (redirect) 




resource management 


(nop), (restore), (drop) 


13 



Notable among the transitions are those for resource management - they save a re- 
source for future use and remove it from the currently available set, create a dummy 
resource, or select a different continuation. The interaction between resource manage- 
ment and exception handling is non-frivial because resources must be restored when 
exiting a use-region in any way. For that reason first-class continuations are restricted to 
accept only the resource set available at point of capture; unlike them, exception handlers 
are allocated on a stack, thus it is possible to allow exceptions not to be tied to specific 
resources by intercepting them and restoring the resources. 

Our implementation of exceptions is more realistic than the typical [23] which prop- 
agates exception packets through all enclosing terms whose evaluations is pending. The 
exception handler forms a stack parallel to the continuation stack (or heap). Raising 
an exception (rule (raise)) creates an exception packet which is processed like a value 
with continuation on the exception handler stack; the bindings on this stack, created 
by handle, restore the default continuation. This scheme has on overhead if exceptions 
are not used, and overhead linear in number of catching handlers when an exception is 
thrown. 
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Environment 

(app) {@xi X2, E, a, Rs, e, r) (e', E'[x' !->■ E{x2)], a, Rs, e, t) 

where Closure ;r'. e', if') = '^E{x\) 

T = ’^E{x 2 ) 

(let) a: : r' e' in e, E, a, Rs, e, r) 

(e', E, a, Bind x:r'. e, E\Fv(e)-{^},r) Rs, e, t') 

(bind) c, Bind (A<“’'-">a:;r.e', S', r') iis', e, t) h->i 

(e', E'[x 1 -^ {■y{v,E) : r)], a, Rs' , e, r') 

Store (rules valid when M G rs) 

(ref) x, E, a, Rs, e, t) i-J-i 

( [m'J E[x' !->■ (Loci : t)], a, Rs' , eU{store}, r) 
where r = ref[^(m)] 

{£,Rs') = ref,^ CE{x), Rs) 
x' ^ Dom (E) 

(deref) x, E, a, Rs, e, r) i-^-i 

([m'J E[x' !->■ {deref^ {£, Rs) : t)], a, Rs, eU {store}, r) 
where (Loc £ : ref[r]) = E{x) 

x' ^ Dom (E) 

(update) x' , E, a, Rs, e, unit) i— 

E, a, updaters {{£, ^E{x')), Rs), eU {store}, unit) 
where (Loc £ : t) = E{x) 

T = ref[^(a:')] 

where rs = tag {Rs), a £ rs 

Fig. 12. Instrumented transition rules, part 1 

3.3 Soundness of the Type System 

To prove soundness of the type system we extend it with rules for maehine configurations, 
and prove that this extension has the subject reduction and progress properties. The 
interesting aspect of assigning a type to a configuration in this system is that some of 
the values in the environment may refer to resources which are currently inaccessible 
(blocked by an enclosing use), which complicates the notion of type correctness. The 
details of the proof are omitted for space considerations. 

The progress and subject reduction properties are combined in the following lemma. 

Lemma 1 (Safety). If C = (e, E, c, Rs, e, r) and 0 \~c C : ro;eo, then either 
e = [wj for some value v such that 0; ^ h„ r; : r, and isEmpty^g (i?s) = true, or 
there exists a configuration C such that C i— l-i C and 0 \~c C \ Tq; Sq- 

(Note that the case of a value in an empty continuation covers both normal termination 
of the program and the case of an unhandled exception propagated to the top.) 

As a corollary we obtain soundness of the system. 
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First-class Continuations (rules valid when H £ rs) 

(callcc) x, E, Rs, H, e, r) i— 

(e' , E'[x' 1-^ (Cont k : M)]; H, Rs' , e U {callcc}, r) 

where {k, K, fee) = singlets {Rs) 

Rs' = if X ^ rs then Rs else restoreExn (H)(_Rs) 

^E{x) = Closure : cont<^’™> [r]. e', LI') 

(throw) (throw^^’’’®^ [ t] ati X2, E, a, Rs, e, r) 

E, H, {{ki,K,k^, Rs'), e, t") 

where (Cont k\ : cont^'^’’'®^ [t"]) = E{xi) 

{{k, K, ke), Rs') = proj^, (Rs) 

Exceptions (rules valid when X £ rs) 

(handle)(e handle^“’''®^ x : exn. e', E, a, Rs, e, r) i— (e, E, a, Rs', e U {exception}, r) 

( restoreCont {a, Rs) o \ 

(Bind (A^“’’'“^ a;:exn. e', (e')_{a,}, t) ) ° j (Rs) 

restoreExn (a) j 

(raise) (raise^“’’^“^ [r] x, E, a, Rs, e, t) E, X, Rs, e U {exception}, exn) 

Resource Management 

(add) (use^“’™^ {{a, rs')) e, E, a, Rs, e, r) 

((a, rs')) e, E, a, inj^s {empty'’ , Drop r :;“j Rs'), e, r) 

where r G rs' — rs 

Rs' = if X ^ rs then Rs else Drop r restoreExn (a){Rs) 

(block) (use^“’™^ {(a, rs')) e, E, a, Rs, e, r) 

((a, rs')) e, E, a. Restore (r, R) Rs" , e, r) 

where r G rs — rs' — {a} 

Rs" — it X ^ rs' then Rs else Restore (r, R) restoreExn {a){Rs) 

{R, Rs') = projrs {Rs) 

(redirect) (use^“’™^ {{a' , rs')) e, E, a, Rs, e, t) 

(use<“'’’-“> {{a',rs'))e, E, a'. Bind (A<“’’'“> o::t. {a:J , E, r) Rs, e, r) 

(nop) (use^“’’’®^ {{a,rs))e, E, a, Rs, e, r) (e, E, a, Rs, e, r) 

(restore) E, c, Restore {r,R) -.-.rs Rs' , e, r) 

(H c, inj;,{R, Rs'),e, r) 

where c £ rs, r ^ rs 

(drop) E, c. Drop r Rs' , e, t) i-^i E, c, drop').„ {Rs'), e, r) 

where c G rs, r £ rs — {c} 

where rs = tag {Rs) 

a £ rs 

restoreExn {a){Rs) = replaeeResource {a, X, Rs){Rs) 
restoreCont {a, Rs') = replaeeResource (X, a, Rs') 
replaeeResource {c,r, Rs') — (Drop r ) o (Restore (r, singlets {Rs')) ■'■rs ) 

where tag {Rs') = rs, {c, r} C rs 



Fig. 13. Instrumented transition rules, part 2 
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Theorem 1 (Soundness). If C = {e, E, c, Rs, e, t) and 0 he C : to;£o, then C 
computes to a value, or to an exception packet, or its computation diverges. 



4 Related Work 

Interoperability is a primary eoneem of eomponent-based models sueh as CORBA [12] 
andMierosoft’s COM [15,13]. Safety in these systems is in eonfliet with the performanee 
requirements, and even sandboxing may fail to provide eorreet semanties when the effeet 
systems of the eommunieating languages go beyond the value/store interfaee. In eom- 
parison our system allows flexible and effieient interoperation between languages with 
different resourees, and ensures safety by exposing the resouree and effeet requirements 
in the types of the eomponents. 

Foreign funetion eall interfaees [6,14] are related in purpose but are designed under 
the eonstraint to be eompatible with legaey eode whieh is often unsafe. Solutions to 
this problem have major impaet on interoperability today. We do not attempt to solve 
eompatibility issues in the system presented in this paper. Our design is for interoper- 
ability between safe eomponents with language-independent interfaees, aimed to satisfy 
high performance requirements when running in shared address space. We emphasize 
building a safe, efficient, and robust interface across multiple HOT languages. 

The present work extends our earlier results [17] on a type system with effect and 
resource control for continuation allocation. While the state resource is essentially in- 
dependent of the rest, the interactions of the exception resource with the continuation 
resources are non-trivial. 

Although we do not present the semantics in terms of monads, the idea to use both 
resources and effects to describe a function’s interoperability was inspired by recent work 
on monad-based interactions and modular interpreters [21,9,20,10], and Wadler’s work 
on the relationship between monads and effects [22]. Monads, viewed as compositions of 
basic monad transformers [10], can be used to represent sets of resources. The transition 
rules creating a binding activation frame and binding a variable to a value (Figure 12), 
given a set of resources, provide the semantics of the ‘bind' and unit of the monad. 
Resource-specific primitives are interpreted by lifting the operations of the corresponding 
monad through the monad transformers enclosing it in the composition. 

What makes our approach different is that our system keeps track of effects, which 
allows us to determine which components of a monad transformer composition are being 
used only trivially (i.e. only their ‘bind' and unit are invoked) and therefore can be 
eliminated or simulated. Furthermore, we wish to extend and reduce the set of resources 
in a commutative way, and for that purpose we represent the result of transformer com- 
positions “horizontally” - the corresponding resources are collected in one component 
of the abstract machine configuration. Thus if the set of resources required by term e 
corresponds to the monad M = (Ti o . . . o Tn)Id, extending it via use'’ (e) is, in gen- 
eral, not expressed as an application of a monad transformer T to M, but as the use of a 
monad morphism [2] to embed values of M into a monad isomorphic to a composition 
of Ti, . . . , T„, r in a canonical order. Proving the equivalence of a monads defined as 
compositions with their horizontal representations meets the technical complexity of 
constructing morphisms between them. We have opted instead for operational seman- 
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tics which gives a more direct correspondence with an implementation. In our semantics 
the complexity of lifting monads through monad transformers is reflected in the inter- 
action between various resources, e.g. in transition rules (callcc), (add), and (block) 
(Figure 13). 

Closely related to our work is the research on effect systems [4,7,8,18,19]. The ef- 
fects in our system are used for verifying interoperability constraints; in this context 
the novel bounded effect quantiflcation is introduced as a form of effect polymorphism 
under resource restrictions which allows us to take advantage of the effect-resource re- 
lationship to support advanced compilation strategies. The effect inference suggested 
in the typing rules in Figure 8 is conservative and there is considerable room for im- 
provement borrowing from prior work by e.g. finer separation of effects and adopting 
region inference or for determining their localization; however it is still not clear to what 
extent this improvement will materialize in practice - for instance the exception effects 
can be easily localized to a handler, but due to the typical extensibility of the exception 
type most handlers re -raise exceptions. We intend to test these variations in a prototype 
implementation under development within the FLINT system. 
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Abstract. Expressions in the programming language C have such an 
under-specified semantics that one might expect them to be non-deter- 
ministic. However, with the help of a mechanised formalisation, we have 
shown that the semantics’ additional constraints actually result in a large 
class of C expressions having only one possible behaviour. 



1 Introduction 

The semantics of the programming language C is specified in an ISO 
standard [3] . However, this semantics is written in natural language, and 
is thus unsuitable as the basis for formal work such as verification. Indeed, 
there are a number of unresolved disputes about various details in this 
standard.^ 

However, our Cholera formalisation [6,7] is a completely formal se- 
mantics for the bulk of the C language. It is formulated in a structural 
operational style (see, for example, [2]) and is embedded in the HOT the- 
orem prover [Ij. On this basis, it is possible to prove facts about the 
C language (modulo the degree of certainty with which one believes the 
formalisation to be correct). For example, it is possible to derive vari- 
ous “axiomatic” rules that allow one to reason about C programs with 
Hoare-like triples, as described in [5,7]. 

The work described here considers the semantics of C expressions, and 
in particular demonstrates that a significant class of these expressions are 
deterministic. This is an important result in the context of verification 
because it allows one to perform a verification with respect to just one 
possible path of execution. Otherwise, if an expression can evaluate in n 
different ways, then any verification of a program that contains it must 
demonstrate that the final post-condition holds for all n possibilities, a 
tedious task at best. 

** Fax: +U 1223 334678 

^ See for example the Usenet newsgroup comp.std.c, where issues such as whether or 
not function calls may interleave are debated. 
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In addition to defining the semantics of C, our Cholera project aims 
to put results like this determinism theorem to work: using them in the 
verification of not entirely trivial programming examples. Such verifica- 
tion examples will support the thesis that verification of programs in 
complicated programming languages is possible, particularly if one has 
mechanical support for the task. Moreover, the fact that proofs of this 
nature are practical is an indication that even programming language 
semantics can be satisfactorily mechanised. 

The remainder of this paper will first describe the relevant parts of the 
C semantics in section 2. In section 3, we explain why what might initially 
appear to be non-deterministic is in fact deterministic, and outline the 
overall proof strategy. The proof is explained in more detail in sections 4 
and 5, and section 6 concludes. 

2 The semantics of C expressions 

Cholera models the semantics of C’s expressions with a reduction style 
operational semantics using a relation such that (eo,cro) (e, cr) 
holds when an expression cq in state do can take a step, becoming a new 
expression e, and with the state changing to state a. States a, do, d' etc. 
embody not just the usual mapping from variables to values, but also 
information about the program environment, and pending side effects. 
We use to denote the reflexive and transitive closure of the single 
step relation. 

There are two principal sources of non-determinism in the semantics. 
These are the rules for the evaluation of binary expressions and the way 
in which side effects are applied. The following two rules illustrate the 
first: 

(ei,dp) (e, d) (62, dp) (e, d) 

(ei © 62, dp) (e 0 62, d) (6i 0 62, dp) {&! © 6, d) 

Here © stands for all of C’s arithmetic operators, but not for the 
logical operators && and I I , nor the comma operator, nor the assign- 
ment operators. The first rule says that if the first argument to a binary 
operator can take a step in the semantics, then so too can the contain- 
ing expression. The non-determinism enters because both rules apply at 
all times, meaning that an expression is not constrained to evaluate its 
operands in any particular order, and may even interleave the evaluation 
of its operands. In the presence of side effects and changes to the state, 
this is potentially a significant source of non-determinism. 
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The second source of non-determinism in the semantics is its handling 
of side effects. Side effects are generated by the evaluation of assignment 
expressions and the various increment and decrement operators (++ and 
— ). These operators have as their side effects the writing of values into 
memory, but this does not necessarily happen immediately. Instead, side 
effects are applied at arbitrary times and in any order, subject only to the 
constraint that all pending side effects be applied before the next sequence 
point. Sequence points occur at certain well-marked stages in expression 
evaluation, such as after the complete evaluation of the first argument to 
the logical operators && and I I . The rule for side effect application is: 

rj is pending in a 
(e, a) (e, apply_se(cj, ry)) 

where apply_se(cj, Vj) denotes the state resulting from the application 
of side effect rj to state a, with the appropriate changes made (memory 
updated, and rj removed from the pending side effects). 

2.1 Constraints on expression evaluation 

The above description of expression evaluation suggests a chaotic picture. 
A naive interpretation would suggest that the evaluation of 



V + V++ + V + V++ 



with V initally 3, could yield any value in the range 12-17. However, 
the language definition imposes severe constraints on the way in which 
expressions can evaluate, and in fact, this expression is undefined. 

The constraint is that “between the previous and next sequence point 
an object shall have its stored value modified at most once by the eval- 
uation of an expression. Furthermore, the prior value shall be accessed 
only to determine the value to be stored.” [3, §6.3] (For our purposes, an 
object is best understood as simply a part of memory.) Violation of this 
constraint results in undefinedness. 

It is worth noting that this is a constraint on the dynamic behaviour 
of the program. Though the expression given above involving v will nec- 
essarily be undefined because it both refers to and updates the object 
denoted by v, it is not clear whether or not this is true of *p + (i = 
1), say, as it is impossible in general to determine whether or not *p (a 
dereferencing of pointer variable p) will refer to i. Finally, note that the 
second sentence quoted above allows references to take place if they occur 
on the right-hand of an assignment expression and the references are to 
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the object being modified by the assignment. For example, this clause 
allows i = i + 1. 

A formal semantics of C must model this constraint as well as the 
more obvious rules given earlier. To do this, Cholera keeps track of three 
state components: 

— the pending side effects 

— those parts of memory which have been updated (the “updatejnap”) 

— those parts of memory which have been referred to (the “ref jnap” ) 

The pending side effects component is a multi-set, or bag, as the same 
side effect might occur twice in a given evaluation. The updatejnap is a 
set of addresses, as no evaluation will be allowed to update the same 
location twice. The ref jnap is another bag, as multiple references to the 
same location can legitimately occur. As we shall see, we need to know 
how many references were made to a particular location, not just whether 
or not something has been referred to. 

There are four different ways in which these components can change 
in the evaluation of an expression. 

— When a non-array lvalue becomes a value, the ref jnap is increased to 
reflect the reference of the object denoted by the lvalue.^ If the part of 
memory referred to is in the updatejnap, this causes undefinedness. 

— When a side effect is applied, it is removed from the pending side 
effects bag, and the updatejnap is increased, recording the fact that 
part of memory has just been changed. If that part of memory has 
already been updated or referred to, this causes undefinedness. 

— When an assignment completes its evaluation, a side effect to update 
the appropriate part of memory with a new value is added to the pend- 
ing side effects bag. Assignment expressions keep track of references 
made on their right hand sides, and those that were to the object to 
be updated are removed from the ref jnap. Failure to do this would 
cause the side effect created as a result of evaluating i = i + 1 to 
clash with the reference to i on the expression’s RHS. 

Because the ref jnap records a count of the number of times a piece 
of memory has been referred to, this deletion of references may still 
leave references recorded. Using only a set for ref jnap would allow 

i + (i = i + 1) 

Array lvalues become pointers to their first element; this transformation does not 
require a reference to memory. 
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to avoid revealing its undefined nature. A possible evaluation would 
have the i on the assignment’s RHS remove the record of a previous 
reference to i on the LHS of the addition. 

— When the pending side effects bag is empty, and a sequence point is 
reached in an expression’s syntax, the refjnap and updatejnap are 
“zero-ed” , thereby allowing a new sequence of reference and updates 
in the next phase of execution. If a sequence point is reached, and the 
bag of pending side effects is not empty, it will need to be emptied 
before the next stage of the expression can be evaluated. 

3 Intuition and proof outline 

This may have already suggested that C’s expression semantics, though 
superficially full of non-determinism, is actually so seriously constrained 
that expressions can only evaluate in one way, whether this be to one 
valid result, or to undefinedness. Here we suggest why this is the case, 
and sketch the form of the proof that is to come. 

Ignoring for the moment the fact that side effects are not necessar- 
ily applied immediately nor in order, one can think of the various sub- 
expressions of a greater expression as parallel processes running simulta- 
neously and sharing memory. Clearly, the behaviour of these processes is 
solely dependent on the parts of memory that they reference. But this 
implies that the processes can’t affect each other: a change to a piece 
of memory by one process that another references is forbidden by the 
constraints spelled out in the previous section. 

If a sub-expression can’t affect the parts of memory that another 
depends on, and vice versa, then their evaluation must proceed entirely 
deterministically. Conversely, if shared memory is updated illegally, then 
undefinedness must result. 

The fact that side effects are not applied immediately is also seen to be 
irrelevant. All side effects will come to be applied eventually, as reaching 
a sequence point requires this, and at the minimum, there is a sequence 
point at the end of the evaluation of all expressions that appear within 
statements. Though an update may come quite late, the constraints forbid 
the updating of memory that has been referred to as much as they forbid 
reference of updated memory. 

Nonetheless, there is still a problem with the above intuition: it ignores 
the effect of sequence points that appear within an expression. Consider 
the following expression: 



X + ((x = 3) , 4) 
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Given what we have seen so far of the semantics, it would appear 
that this expression should be genuinely non-deterministic. The comma 
operator on the right is a sequence point, so if an evaluation were to 
proceed by first evaluating x = 3, reaching the sequence point, clearing 
the updatejnap, and then proceeding with the rest of the expression, it 
should go on to give a result of 7. 

On the other hand, if the lone x on the left were to be evaluated first, 
then the subsequent assignment expression (necessarily the next thing 
to be evaluated) would cause undefinedness, because it would update an 
object which had already been referenced. 

In fact, a subtle argument about this case forces the conclusion that 
the expression is necessarily undefined.^ An official response by the Stan- 
dards committee to a public query (a “Defect report”) [4, #117] makes it 
clear that if it is possible for an expression to exhibit undefined behaviour 
(there might be an order of evaluation that does this, for example), then 
the whole expression is undefined. 

Cholera does model this requirement, but is forced to do so at the 
level above the definition of ^e- We add a rule to the effect that a given, 
defined, reduction sequence is only part of the semantics if there doesn’t 
exist any other sequence which makes the behaviour undefined. However, 
this additional detail in the semantics is difficult to reason about, so we 
choose to examine those expressions which are free of internal sequence 
points. Unless otherwise stated, all results stated here will be for expres- 
sions that are free of internal sequence points. 

Our eventual determinism result for sequence point free expressions 
then naturally holds of expressions where the only sequence points are 
present at the top level (such as in x | | (y && z)). This is because 
such an expression has deterministic sub-expressions, and these must be 
evaluated in the order dictated by the presence of the sequence points, 
giving an overall behaviour which must also be deterministic. 



3.1 Proof outline 

We should like to demonstrate determinism by showing a diamond prop- 
erty for all of the possible reductions that an expression might undergo. 
Graphically, this amounts to showing that in all situations we can find 
reductions to fill in the dashed lines below: 



® My thanks to Mark Brader for explaining this to me. 
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ei,CJi 62 , 0-2 



\ a' 

e, o 



It follows that if this can be shown for single steps of a reduction 
relation, then that reduction system must be confluent. Unfortunately, 
this property does not hold in general for C. In particular, reductions 
that involve undefined behaviour tend to invalidate further reductions, 
and if the reduction to (ei,cJi), say, caused undefined behaviour then 
there is no guarantee that a reduction analogous to the one taken to get 
to (62,02) should be possible. 

Therefore, the first step in attacking the proof is to divide it into two 
parts. First we demonstrate confluence for evaluations which terminate 
normally, i.e., those which yield a value and which apply all of the side 
effects generated in the course of the expression evaluation. Then we show 
that if an evaluation sequence exists which leads to undefined behaviour, 
this undefinedness can not be escaped, and that all states reachable from 
the initial one must necessarily either be undefined themselves, or still 
admit the possibility of becoming undefined in one or more steps. 

This second result makes it clear that a normal terminating evaluation 
and an undefined one can not both begin from the same initial state. We 
reason as follows: assume that such a situation exists. Then our second 
result states that it is possible to reach undefinedness from the final state 
of the normal evaluation. But if it is a final state, then it can not take 
any more steps, and it is not in an undefined state itself because it has 
yielded a proper value. Thus we have a contradiction and an assurance 
to the effect that all evaluations are in fact deterministic. 

4 Successful evaluations 

Even with the above assumption that our reduction sequences do not 
become undefined, the task of proving determinism for expression evalu- 
ation is quite complicated. In particular, the system as described is made 
difficult to reason about by the fact that side effect applications and 
other forms of reduction can intermingle. The first stage of our proof is 
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to demonstrate that side effect applications can all be postponed to the 
end of an evaluation sequence without affecting the result. 

This should be clear from the constraints described earlier: if a side 
effect application were to make a difference, a subsequent reference to 
memory would need to look at some part of memory that the side effect 
had changed; but this is precisely one of those situations forbidden (a 
reference to updated memory) and would lead to undefinedness, contra- 
dicting our earlier assumption. 

The proof proceeds by first showing that side effect applications and 
other reductions can commute. 

Lemma 1. For all expressions cq, e\, for all states a, ag, cri, and for all 
side effeets tj, if tj is pending in a, with do = apply_se(cj, ry) (i.e., do is 
the state that results from applying rj to a), and (eo,do) (e, di) then 
there exists a state a' sueh that (eo,d) (e, d'), rj is pending in a' and 
di = apply_se(d',? 7 ). 

This is a straightforward rule induction on the inductive definition of 
^e- Another induction readily extends this to allow side effect applica- 
tions to be pushed past any number of other expression reduction steps. 
Using this, we then induct on the number of reductions to prove our 
“separation theorem” : 

Theorem 1 (Separation). For all expressions cq, e, and for all states 
do, d, if (eo,do) (e, d), then there exists a state a' and a sequenee 
of side effeets rj\ . . .rjn where both the updatejnaps and memory eontents 
of do and a' are the same, and (eo, do) (e, d') and a is the result of 
applying the side effeets rji . . .rjn to ah 

(Note that the final value e is present after the expression reduction 
steps, and before the side effect applications begin. This is because these 
later applications can not change the value that an expression yields.) 

We now consider the -^e relation as the union of two components: 
reductions where no side effect applications occur, and reductions that 
are exclusively side effect applications. Let us use -^e for the former and 
for the latter so that ~^e=^E U ->-a- Confluence for both ->-e and 
-^A, together with the separation theorem imply confluence for as 
follows: 

1. Consider two reduction sequences starting at (eo,do) that both com- 
plete normally. One is to (ei, af and the other is to (c 2 , d 2 ). 
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2. By the separation theorem, both reduction sequences can be separated 

into two phases, with intermediate points and (e 2 ,cJ 2 ), such 

that (eo,CTo) (ei,cj'^) and (ei,cj'^) (ei,cJi) (similarly for 62 
etc.) 

3. Because e\ and 62 represent completed evaluations, they must be val- 
ues. As only applies side effects, it doesn’t change expressions. 
Thus the states reached by must be terminal with respect to it. 
Then if -^e is confluent, these intermediate states are actually the 
same. 

4. Now we have two reduction sequences involving from the same 
starting point. As is also confluent, the final states are necessarily 
identical. 

Given this result, we need only prove that -^e and are confluent. 



4.1 Confluence for — >e 

We establish confluence for by demonstrating a diamond property 
for single steps of the relation. 

Before beginning a proof such as this, it is instructive to consider 
parallels with the similar task that one faces in attempting to prove con- 
fluence for the A-calculus. There, things are somewhat complicated by the 
fact that a reduction in the RHS of a /3-redex may have to be matched 
by many repetitions of essentially the same reduction in an alternative 
branch where the RHS has been substituted into the body of the LHS. 
This doesn’t happen in the Cholera semantics, where substitution doesn’t 
arise. 

However, the A-calculus is at least entirely syntax-directed; if a redex 
is present, then the reduction can always take place, and its result will 
always be the same. Reductions in the A-calculus can be said to ignore 
their context. This is not the case in Cholera where the accompanying 
state, an ever-present and varying context, can affect reductions. This 
is not just a matter of different values for variables affecting the value 
of an expression, but more significant: a state with a large updatejnap 
may make a reduction that would otherwise turn a variable into a value 
instead produce undefinedness. 

With this motivation behind us, the first stage in our proof will be 
to characterise the degree to which states can vary and yet still produce 
the same reduction for a given piece of syntax. Furthermore, because 
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expression reductions affect the state*^, we want to characterise the way 
in which this happens, so that, ultimately, we will be able to state that 
reduction x can reduce in the same way both before and after reduction 

y- 

Theorem 2 (Reduction characterisation). If{eo,ao) {g,ct) then 
there exists a funetion f eharaeterising the reduetion, sueh that /(uo) = a, 
and for all cJq whieh are “no more restrietive” than do, then (cojITq) -^e 
(e,/(c^o))- 

The meaning of “no more restrictive” above turns out to be rather 
detailed in its expression, really suitable only for the consumption of a 
theorem prover. In essence it requires that the updatejnap be no bigger in 
CJq than it is in do, but there are also are a number of conditions required 
of both the initial states and the expressions involved. One of these is 
that Co be well-typed. Another is that e not be undefined; computations 
that do allow e to become undefined are discussed in section 5. 

We also have the following important lemma, which like the previous 
is established by induction over the reduction relation. 



Lemma 2 (Reduction preconditions preserved). If (eo,do) -^e 
(e, a), then a is no more restrietive than do in the sense of theorem 2. 

Now we can prove the diamond property for -^e relatively straight- 
forwardly. Again an induction is required over the reduction relation. The 
inductive or “sub-expression” cases, where we have two reductions within 
the same sub-expression, are handled by the inductive hypotheses, so it is 
just the cases where an expression form admits two reductions in different 
sub-expressions which prove difficult. This includes both the normal bi- 
nary operators, and also assignment, which needs to be treated separately 
because unlike the other operators, it adds a side effect to those pending. 

In such a situation, our reduction characterisation and reduction pre- 
conditions results tell us immediately that a “diamond” of four sides can 
be constructed. If the functions required to exist by the first result are / 
and g, then the diagram looks like: 



^ Though holds updatejnaps and thus memory constant, we will still get new side 
effects being added to the queue of those pending, and as objects are referred to, 
refjnaps also increase. 
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e, {go f){(To) = e, {fog){ao) 

The question then remains as to whether or not / and g will commute. 
They do in fact, as each does little more than specify the additions to the 
starting state’s ref jnap and pending side effects. Addition on bags being 
commutative, the result follows. 

4.2 Confluence for — 

The second requirement of the proof of is to show that the relation 
is confluent. We show this by demonstrating a diamond property. This is 
a considerably simpler task than for -^e- 

Recall that we are performing reductions in a context where all of the 
side effects can be applied successfully, resulting in normal termination 
with a value. This implies that no pair of pending side effects affect over- 
lapping parts of memory. We show this by contradiction. One of the side 
effects must have been applied first. Subsequent to this application, the 
other side effect can not have been applied because this would result in 
undefined behaviour (two updates of the same part of memory). But if 
the second side effect is not applied, then the final state must still have 
side effects pending, which also contradicts our assumption, because a 
normal termination is a sequence point, by which state all side effects 
must have been applied. 

So, all of the side effects affect different parts of memory, and can 
therefore be applied independently of one another. The required diamond 
property is an immediate consequence of this. 

5 Undefined evaluations 

We begin by defining state safety. A state is safe if none of its pending 
side effects conflict neither with each other (i.e., do not affect overlapping 
parts of memory), nor with the state’s ref jnap and updatejnap. It should 
be clear that a state which is safe can apply all of its side effects without 
becoming undefined. The converse is the basis of our first lemma in this 
section. 
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Lemma 3 (Finite and unsafe states can become undefined). If a 

state do is both unsafe and has a finite bag of pending side effeets, then for 
all Co there exists a reduetion sequenee sueh that (eo,do) fU) , where 
U represents undefinedness. 

Also, for all eg, e, do and a, if (Tq is unsafe, and (eo,do) (e, d), 
then d is also unsafe. 



Cholera represents undefinedness arising as a result of expression eval- 
uation (e.g., division by zero, or a reference to a variable already updated) 
by replacing the offending expression with U in the syntax tree and then 
letting this “bubble” its way to the top of the tree. This can not be 
prevented. 



Lemma 4 (Undefined sub-expressions can always ascend). If an 

expression cq eontains U as a sub-expression, then for all do there exists 
a reduetion sequenee sueh that (eo,do) (U) . 

Also, for all eo, e, do and a, if eo has an undefined sub-expression, 
and (eo,do) (e, d), then e must also have an undefined sub-expression 
(where e itself may be that undefined sub-expression). 



These two results (neither of which is particularly surprising) make it 
clear that a large class of expression-state pairs, those which are unsafe 
or which have undefined subexpressions, though not necessarily “fully 
undefined” , might as well be. We shall refer to such states as effeetively 
undefined. Though a state’s being effectively undefined may not seem 
such a strong claim initially, the condition preservation clauses of the 
lemmas above should make it clear that an effectively undefined state is 
one which can never yield a value. In conjunction with the fact that all se- 
quence point free expressions must terminate®, we can see that effectively 
undefined means “will necessarily become undefined” . 

Our next theorem is more significant. We wish to show that if a re- 
duction occurs which makes something effectively undefined, when it was 
not effectively undefined before, then if one takes a different step from 
the same initial state, the result will either be effectively undefined, or 
it will retain the ability to make a reduction to an effectively undefined 
state. This can be represented as a “broken” diamond: 



Sequence point free expressions do not include function applications. 
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Another analogy is that of the cliff-edge. Over the edge lies effective 
undefinedness. Once one reaches the edge, one can walk along it, but while 
it may be possible to avoid falling over the edge for some indeterminate 
length of time, it is not possible to move away. The proof proceeds in a 
similar way to that of the proof of the confluence of . 

While the inductive cases are straightforward, we need to cope with 
the fact that the reduction from (cq, (Tq) to (ei, ui) might involve a reduc- 
tion in a sub-expression unrelated to that which produced the undefined- 
ness. Inside (ei, ui) we want to have a reduction occur that is analogous to 
the one that produced undefinedness from (eo,cro). We do this by again 
establishing a reduction characterisation result, and by demonstrating 
that reductions preserve this. 

In this case, the characterisation is essentially that an analogous re- 
duction to undefinedness can occur in any state that is at least as restric- 
tive as the original. This condition is preserved both by and -^a- 

We then do an induction of the number of steps along the cliff’s edge 
to produce: 

Theorem 3 (The cliff’s edge). For all cq, cjo, if {eo,o'o) -^e (ei,cri) 
and (ei,cJi) is effectively undefined, then for all C2 and cj2 such that 
(eo,cTo) {e2,o'2), there exists e' and a' such that 
and (e', a') is effectively undefined. 

We still need to add one more diamond property. This is a surprisingly 
easy proof as it does not require an induction over the meaning relation. 
Instead the characterisation functions and our lemma (1) that -^e and 
commute combine to give: 

Theorem 4 (A diamond property for -^e and For all cq, 

ei, 62 (To, (Ti, ( 72 : if (eo,cTo) -^E (ei,cTi) and (eo,cTo) {^2,(72} and 
both (ei,cJi) and {02,(72) are not effectively undefined, then there exist e 
and a (possibly effectively undefined), such that (ei,cJi) -^a (e, ct) and 
(62,0-2) ^E ( 6 , 0 -). 

The final proof is now possible. We wish to show that if a reduction 
sequence takes an initial state ((eo,( 7 o)) to undefinedness, then all other 
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possible destinations from the same starting point retain this possibility. 
In essence, we exploit the possibility of completing a confluent diamond 
on the cliff-tops. 

1. We have a reduction sequence from (eo,cTo) to undefinedness. Let 
(ei, cJi) be the last state in this sequence not effectively undefined. 

2. We have another reduction sequence to (e 2 ,cJ 2 ), and by assumption 
this is not effectively undefined. 

3. Therefore, using all three of our diamond properties for -^e and -^a, 
we have a common possible destination for both (ei, ui) and (e 2 , CT 2 ). 
Call this (e, a). 

4. Having come along the cliff’s edge from (ei,cJi), (e, a) must still be 
on the edge, thereby retaining the possibility of a reduction to an 
effectively undefined state, if it is not an effectively undefined state 
already. 

5. Effectively undefined states all allow for a reduction sequence to “full” 
undefined-ness, so (e 2 , (J 2 ) must do so as well by virtue of being able 
to reduce to (e, a). 

6 Conclusion 

The fact that we have ended with proofs of diamond properties for 

and -^E vs. may suggest that the rather specialised proof strat- 
egy used in section 4.1 might as well have been subsumed into an all- 
encompassing proof of confluence for the whole meaning relation. In par- 
ticular, it is easy to see with hindsight that demonstrating a diamond 
property, where neither reduction is to an effectively undefined destina- 
tion, would have been reasonably straightforward. Nonetheless, the only 
theorem that becomes redundant in this alternative proof is the separa- 
tion result (theorem 1). All the other results given are necessary parts of 
either proof. 

It is extremely important that this work was built on the support 
provided by mechanical theorem proving (HOL, in this case). It would 
have been unimaginable without that support. The proof script for prov- 
ing this result is almost 6000 lines of SML code (excluding comments). 
This work is thus a demonstration of both the importance and utility of 
mechanised theorem-proving. The mechanisation of the semantics ensures 
that one can be sure of one’s results, and that no details have been over- 
looked. In this work, the diamond proofs in question involved analysis of 
many (approximately 200) different cases corresponding to a pair-wise ex- 
amination of all the possible ways in which all possible expressions might 
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evolve. Such a proof done by hand would be inevitably subject to question 
because of the high possibility of error. With HOL’s help, the possibility 
of error has been eliminated. 

This work is also valuable because it demonstrates an interesting result 
about the programming language C. This in turn is a demonstration that 
the practical formalisation of programming language semantics is not an 
impossible dream. In [8], Ritchie says “the C standard did not attempt 
to specify formally the language semantics, and so there can be dispute 
over fine points”. In the formal setting provided by Cholera, fine points 
are no longer the subject of dispute: not only does the language gain 
an unambiguous specification, it is also possible to state the definition’s 
consequences with certainty. 
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Abstract. A Hoare-style programming logic for the sequential kernel 
of Java is presented. It handles recursive methods, class and interface 
types, subtyping, inheritance, dynamic and static binding, aliasing via 
object references, and encapsulation. The logic is proved sonnd w.r.t. an 
SOS semantics by embedding both into higher-order logic. 



1 Introduction 

Java is a practically important object-oriented programming language. This pa- 
per presents a logic to verify sequential Java programs. The motivations for 
investigating the logical foundations of Java are as follows: 

1. Java plays an important role in the quickly developing software component 
industry and the smart card technology. Verification techniques can be used 
for static program analysis, e.g., to prove the absence of null-pointer excep- 
tions. The Java subset used in this paper is similar to JavaCard, the Java 
dialect for implementing smart cards. 

2. As pointed out in [MPH97], logical foundations of programming languages 
form a basis for program specification technology. They allow for expressive 
specifications (covering e.g., abstraction, sharing-properties, and side-effects) 
and are needed to assign a formal meaning to interface specifications. 

3. Formality is a prerequisite for tool-based verification. Tool support is neces- 
sary to keep large program proofs error-free. 

4. Java is typical for a large group of 00-languages including C-I-+, Eiffel, 
Oberon, Modula-3, BETA, and Ada95. The developed techniques can be 
adapted to other languages of that group. 

The goal underlying this research is the development of interactive programming 
environments that support specification and verification of 00-programs. Some 
design decisions have been made w.r.t. this goal. 



Approach. Three aspects make verification of 00-programs more complex than 
verification of programs with just recursive procedures and arbitrary pointer 
data structures: Subtyping, abstract types, and dynamic binding. Subtyping 
allows variables to hold objects of different types. Abstract types have different 
or incomplete implementations. Thus, techniques are needed to formulate type 
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properties without referring to implementations. Dynamic binding destroys the 
static connection between the calls and the body of a procedure. 

Our solutions to these problems build on well-known techniques. Object 
stores are specified as an abstract data type with operations to create objects 
and to read and write instance variables/ attributes. Based on object stores, ab- 
stractions of object structures can be expressed which are used to specify the 
behavior of abstract types. To express the relation between stores in pre- and 
poststates, the current object store can be referenced through a special variable. 

Our programming logic refines Hoare logics for procedural languages. To 
handle dynamic binding, the programming logic allows one to prove properties 
of so-called virtual methods, i.e., methods that capture the common properties 
of the corresponding methods in subtypes. The distinction between the virtual 
behavior of a method and the behavior of the associated implementations allows 
one to transfer verification techniques for procedures to 00-programs. 

The logic is proved sound w.r.t. an SOS semantics of the programming lan- 
guage. Since the semantics of modern 00-languages tends to be rather complex, 
such soundness proofs can become quite long for full-size languages and should 
therefore be checkable by mechanical proof checkers. To provide a basis for me- 
chanical checking, we embed both semantics into a higher-order logic and derive 
the axioms and rules of the logic from those of the operational semantics. 



Related Work. In [Lei97], a wlp-calculus for an 00-language similar to our 
Java subset is presented. In contrast to our work, method specifications are part 
of the programs. The approach in [Lei97] can be considered as restricting our 
approach to a certain program development strategy (in [PHM98], we discuss 
this topic). Thereby, it becomes simpler and more appropriate for automatic 
checking, but gives up flexibility that seems important to us for interactive pro- 
gram development and verification. A different logic for 00-programs that is 
related to type-systems is presented and proved sound in [AL97] . It is developed 
for an 00-language in the style of the lambda calculus whereas we are aiming to 
directly support the verification of an existing practical language. The presented 
programming logic extends the foundations developed in [PHM98] by covering 
encapsulation and subclassing. Furthermore, [PHM98] does not discuss the re- 
lation between the logic and a formal semantics and does not prove soundness. 

In [vON98], type-safety is formally proved for a Java subset similar to ours. 
Corresponding to our soundness proof, both operational semantics and typing 
rules are formalized in higher-order logic. However, the type-safety proof has 
already been mechanically checked in Isabelle. [JvdBH+98] uses an operational 
semantics of a Java subset to verify various properties of implementations with 
the PVS proof checker without employing an axiomatic semantics. As will be- 
come clear from the presented paper, Hoare-logic provides an additional level of 
abstraction. This simplifies the handling of subtyping and abstract methods, and 
proofs become more intuitive. In practice, verification requires elaborate specifi- 
cation techniques like the one described in [Lea96]. In [MPH97], we outline the 
connection between such specifications and our logic. 
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Overview. Section 2 presents the operational semantics of the Java kernel, 
section 3 the programming logic. The soundness proof is contained in Sect. 4. 



2 A Semantics for Sequential Java 

This section describes the sequential Java kernel, Java-K for short, and presents 
its dynamic semantics. Compared to Java, Java-K supports only a simple ex- 
pression and statement syntax, but captures the full complexity of the method 
invocation semantics. As specification technique we use structural operational 
semantics (SOS). We assume that the reader is familiar with Java and explain 
only the restrictions of Java-K. The formal presentation concentrates on those 
aspects that are needed for the soundness proof in Sect. 4. 



Java-K Programs. A Java-K program is a set of type declarations where a 
type is either a class or an interface. A class declares its name, its superclass, the 
list of interfaces implemented by the class, and its members. A member is a field, 
instance method, or static method. Members can be public, protected, or pri- 
vate. Java-K provides the default constructor, but does not support constructor 
definitions. Method declarations contain the access mode, the method signature, 
a list of local variables, and a statement as method body. To keep things simple, 
methods in Java-K have exactly one parameter named p and have always a re- 
turn type. Overloading is not allowed. The return type, the name of the method, 
and the parameter of the methods are given by the so-called method signature. 
An interface declares its name, the list of extended interfaces, and the signatures 
of its methods: 



data type 

JavaK-Program 

TypeDecl 

ClassBody 

MemberDecl 



Mode 

InterfaceBody 

ITypeldList 

MethodSig 

VarList 

VarDecl 

Type 



list of TypeDecl 

ClassDecl( CTypeld CTypeld ITypeldList ClassBody ) 
InterfaceDecl( ITypeld ITypeldList InterfaceBody ) 
list of MemberDecl 
FieldDecl( Mode Type Fieldid ) 

MethodDeel ( Mode MethodSig VarList Statement ) 

StatieMethDeel ( Mode MethodSig VarList Statement ) 

Private() \ Proteeted() \ Public() 

list of MethodSig 

list of ITypeld 

Sig( Type Methodid Type ) 

list of VarDeel 

Vardcl( Type Varld ) 

booleanT() \ intT() \ nullT() \ et( CTypeld ) \ it( ITypeld ) 



Java-K has the predefined types booleanT, intT, and nullT (the type of the 
null reference), and the user defined class and interface types. The subtype re- 
lation on sort Type is defined as in Java and denoted by 

An expression in Java-K is an integer or boolean constant, the null reference, 
a variable or parameter identifier, the identifier “this” (denoting the reference to 
the object for which the non-static method was invoked), or a unary or binary 
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expression over relational or arithmetic operators. The statements of Java-K are 
defined below along with their dynamic semantics. 



Capturing Statement Contexts. The semantics of a statement depends on 
the context of the statement occurrence. We assume that the program context 
of a statement is always implicitly given and that we can refer to method decla- 
rations in this context. Method declarations are denoted by T@m where m is a 
method name in class T. MethDeclId is the sort of such identifiers. The function 

body : MethDeclId Statement 

maps each method declaration to the statement constituting its body. If T is a 
class type and m a method of T, the function 

impl : Type x Methodid MethDeclId U {undef} 

yields the corresponding declaration; otherwise it yields undef. Note that T can 
inherit the declaration of m from a superclass. Similarly to method declaration 
identifiers, we introduce field declaration identifiers of the form T@a where a is a 
field name in class T. The sort of such identifiers is denoted by FieldDeclId. They 
are needed to distinguish instance variables with the same field name occurring 
in one object. 



States. A statement is essentially a partial state transformer. A state in Java-K 
describes (a) the current values for the local variables and for the method pa- 
rameters p and this, and (b) the current object store. Values in Java-K are either 
integers, booleans, the null reference, or references to objects of a class type: 



data type 

Value = b( Bool ) 

I Int ) 

I null{) 

I ref{ CTypeld, Objid ) 



T : Value Type 
T{b{B)) = booleanT 

T{i{I)) = intT 

T{null) = nullT 

rirefiT, 01)) = ct{T) 



Values constructed by ref represent the references to objects. The sort Ohjld 
denotes some suitable set of object identifiers to distinguish different objects of 
the same type. The function r yields the type of a value. 

The state of an object is given by the values of its instance variables. We 
assume a sort InstVar for the instance variables of all objects and a function 

instvar : Value x FieldDeclId InstVar VJ {undef} 

where instvar{V ,T@a) is defined as follows: If V is an object reference and the 
corresponding object has an instance variable named T@a, this instance variable 
is returned. Otherwise instvar yields undef. The state of all objects and the 
information whether an object is alive (i.e., allocated) in the current program 
state is formalized by an abstract data type Object Store with sort Store and 
the following functions: 



_(_ := _) : Store x InstVar x Value 

_(_) : Store x CTypeld 

_(_) : Store x InstVar 

alive : Value x Store 

Store X CTypeld 



Store 

Store 

Value 

Bool 

Value 



new 
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OS {IV := V) yields the object store that is obtained from OS by updating 
instance variable IV with value V . OS{T) yields the object store that is obtained 
from Ob' by allocating a new object of type T. OS (IV) yields the value of instance 
variable IV in store OS. If V is an object reference, alive{V, OS) tests whether 
the referenced object is alive in OS. new{OS, TID) yields a reference to an 
object of type ct{TID) that is not alive in OS. Since the properties of these 
functions are not needed for the soundness proof in Sect. 4, we do not discuss 
their axiomatization here and refer the reader to [PHM98] . 

Program states are formalized as mappings from identifiers to values. To have 
a uniform treatment for variables and the object store, we use $ as identifier for 
the current object store: 

State = ( Par/d U { this, p} ^ Value U {undef}) x ({$} ^ Store U {undef}) 

For S € State, we write b(x) for the application to a variable or parameter 
identifier and b($) for the application to the object store. By b[x := V] and 
b[$ := Ob] we denote the state that is obtained from b by updating variable x 
and $, respectively. The canonical evaluation of expression e in state b is denoted 
by e(b, e) yielding an element of sort Value or undef (note that expressions in 
Java-K always terminate and do not have side-effects). The state in which all 
variables are undefined is named initS. 



Statement Semantics. The semantics of Java-K statements is defined by in- 
ductive rules. S', s S' expresses the fact that executing Statement s in State 
S terminates in State S' . In the rules, x and y range over variable or parameter 
identifiers, and e over expressions. 

In order to keep the size of the specification manageable, we assume that some 
Java-K statements are given in a syntax that is decorated with information from 
type and name analysis. An access to an instance variable a of static type T 
is written as T@a. Java-K provides statements for reading and writing instance 
variables with the following semantics (note that the context conditions of Java 
and the antecedent of the rule guarantee that instvar{y, T@a) is defined): 

b(y) 7^ null 

S : X — y.T@a; ^ b[x := b($)(mstoar(y, T@a))] 
b(y) 7^ null 

S: y.T@a=e; ^ S[$ := S{$){instvar{y,T@a) :=e(b,e))] 

In Java, there are four kinds of method invocations: (a) invocations of public 
or protected methods, (b) invocations of private methods, (c) invocations of su- 
perclass methods, and (d) invocations of static methods. Invocations of kind (a) 
are dynamically bound, the others can be bound at compile time. To make the 
context information visible within the SOS rules, we distinguish kind (a) and (b) 
syntactically: y.T:m(e) denotes an invocation of kind (a) where T is the static 
type of y. A statically bound invocation is denoted by y.T@m(e) where T is the 
class in which m is declared. We can use the same syntax to handle invocations of 
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superclass methods: A Java method invocation of the form super. m(e) occurring 
in a class C is in Java-K expressed by a call this.CSuper@m() where CSuper is 
the nearest superclass of C containing a declaration of m. This way, semantics 
of invocations of kind (b) and (c) can be given by the same rule. Invocations of 
kind (d) behave similar, but do not have a this-parameter. 

To focus on the interesting aspects, Java-K does not support a return state- 
ment. The return value has to be assigned to a local variable “result” that is 
implicitly declared in all methods. Thus, method invocation means passing the 
parameters and the object store to the prestate, executing the invoked method, 
and passing the result and object store back to the invocation environment: 

S{y) 7 ^ null, r(S(y)) ^ T, 

mitSfthis := ^(yjjP := e(S, e),$ := ^(S)] : body{impl{T{S{y)),mj) S' 

S : x=y.T:m(e); ^ S[x := S' (result), $ := S'($)] 

S(y) 7 ^ null, mitSfthis := S(y),p := e(S, e),$ := S($)] : body{T@m) S' 

S : x=y.T@m(e); ^ S[x := S'(result), $ := S'($)] 

The rule for the invocation of static methods is identical to the last rule, ex- 
cept that no this-parameter has to be passed. Besides the statements described 
above, Java-K provides if and while statements, assignment statements with 
cast, sequential statement composition, and constructor calls. The rules for these 
statements are straightforward and given in the appendix. 

3 A Programming Logic for Java 

This section presents a Hoare-style programming logic for Java-K. The logic al- 
lows one to formally verify that implementations satisfy interface specifications. 
For 00-languages, interface specifications are usually given by pre- and postcon- 
ditions for methods, class invariants, history constraints, etc (cf. e.g. [Lea96]). 
The formal meaning of such specifications is defined in terms of proof obliga- 
tions for methods (cf. [PH97]). In this paper, we concentrate on the verification 
of dynamic properties. For proving properties about the object store, we refer to 
[PH97] . This section defines the precise syntax of our Hoare triples and explains 
the axioms and rules of the programming logic. 



Specifying Methods and Statements. Properties of methods and statements 
are expressed by triples of the form {P}comp{Q} where P, Q are sorted first- 
order formulas and comp is either a statement occurrence within a given Java-K 
program, a method implementation represented by the corresponding method 
declaration identifier, or a so-called virtual method. Before we clarify the signa- 
ture over which P, Q are built, we explain the concept of virtual methods. 

Virtual Methods. Java-K supports dynamically bound method invocations. E.g., 
if T is an interface type, and Tl, T2 are classes implementing T, an invocation 
y.T:m(e) can lead to the execution of Tl@m or T2@m depending of the object 
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held by y. To verify dynamically bound method invocations, we need method 
specifications reflecting the properties of all implementations that might be exe- 
cuted. Such specifications express the behavior of the so-called virtual methods. 
For every non-private instance method m declared in or inherited by a type 
T, there is a virtual method denoted by T:m. (This notation corresponds to the 
syntax used for the invocation semantics in Sect. 2.) For private and static meth- 
ods, virtual methods are not needed, because statically bound invocations can 
be directly handled using the properties of the corresponding method bodies. 

Signatures of Pre- and Postconditions. In program specifications, we have to 
refer to types, fields, and variables in pre- and postconditions. We enable that 
by introducing constant symbols for these entities. For a given Java-K program, 
S denotes the signature of sorts, functions, and constant symbols as described 
in Sect. 2. In particular, it contains constant symbols for the types and fields. 
Furthermore, we treat parameters, program variables, and the variable $ for the 
current object store syntactically as constant symbols of sort Value and Store to 
simplify quantification and substitution rules and to define context conditions 
for pre- and postconditions. 

A triple { P } comp { Q } is called a statement annotation, implementation 
annotation, or method annotation if the syntactical component comp is a state- 
ment, method implementation, or virtual method, respectively. Pre- and post- 
conditions of statement annotations are formulas over SU {this, p, $} U VARfm) 
where m is the method enclosing the statement and VARfm) denotes the set 
of local variables of m. Preconditions in method annotations or implementation 
annotations are formulas over AUjlhis, p, $}. Postconditions in such annotations 
are formulas over S U {result, $}. 

To handle recursive methods, we use sequents of the form A (> A where A 
is a set of method and implementation annotations and A is a triple. Triples in 
A are called assumptions of the sequent and A is called the consequent of the 
sequent. Intuitively, a sequent expresses the fact that we can prove a triple based 
on some assumptions about methods. 

Axiomatic Semantics. The axiomatic semantics of Java consists of axioms 
and rules for statements and methods. The new axioms and rules are described 
in the following two paragraphs. The standard Hoare rules (e.g., while rule) 
are presented in Fig. 1. A more detailed discussion of programming logics for 
00-languages and their applications is given in [PHM98] . 

Statements. The cast-axiom is very similar to Hoare’s classical assignment ax- 
iom. However, to prevent runtime errors, a stronger precondition assures that the 
type conversion is legal. The constructor-axiom works like an assignment axiom: 
The new object is substituted for the left-hand-side variable and the modified 
object store for the initial store. Reading a held substitutes the value held by the 
addressed instance variable for the left-hand-side variable. Writing held access 
replaces the initial object store by the updated store: 
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cast-axiom: 
constructor- axiom: 
field-read-axiom: 
field- write- axiom: 



M^(e)^TAP[e/x]} x = (T) e; {P} 

|> { P[new($, T)/x , $(T)/$] } x = new T(); { P } 

l> { y 7^ A P[$(mstr;or(y, S@a))/x] } x = y.S@a; { P } 

l> { y 7^ ^ P [$(mstr;or (y, S@a) := e)/$] } y.S@a = e; { P } 



The invocation-rule uses properties of virtual methods to verify invocations of 
dynamically bound methods. The fact that local variables different from the 
left-hand-side variable are not modified by an invocation is expressed by the 
invocation-var-rule that allows one to substitute logical variables Z in pre- and 
postconditions by local variables w (w different from x) : 

AMP} T:m {Q} 

iTiDnrnTi HY) -vn! p * 

A M y 7^ ntill A P[y /this, e/p] } x = y.T:m(e); { Q[x/result] } 



AMP} X 

invocation-var-rule: 

AM P[w/Z] } X 



y.T:m(e); { Q } 
y.T:m(e); { Q[w/Z] } 



Static methods are bound statically. Therefore, method implementations are 
used instead of virtual methods to verify invocations. In a similar way, method 
implementations are used to verify calls of private methods and invocations using 
super. In both cases, the implementation to be executed can be determined 
statically. The var-rules for static invocations and calls can be found in Fig. 1. 



static-invoc-rule: 



AMP} T@m { Q } 

AM P[s/p] } X = T.m(e); { Q[x/result] } 



call-rule: 



AMP} T@m { Q } 

A M y 7^ nwll A P[y /this, e/p] } x = y.T@m(e); { Q]x/result] } 



Methods. This paragraph presents the rules to prove properties of method im- 
plementations and virtual methods. Essentially, an annotation of a method im- 
plementation m holds if it holds for its body. In order to handle recursion, the 
method annotation may be assumed for the proof of the body. Informally, this 
is sound, because in any terminating execution, the last incarnation does not 
contain a recursive invocation of the method: 

A , {P} T@m {Q} \> {this null A P} body{T@m) {Q} 

implementation-rule: 

AMP} T@m {Q} 

Virtual methods have been introduced to model dynamically bound methods. 

1. e., a method annotation for T:m reflects the common properties of all imple- 
mentations that might be executed on invocation of T:m. If T is a class, there are 
two obligations to prove an annotation A of a virtual method T:m : 1. Show that 
the corresponding implementation satisfies A if invoked for objects of type T. 

2. Show that A holds for objects of proper subtypes of T. The second obligation 
and annotations of interface type methods can be proved by the subtype-rule: 
If S is a subtype of T, an invocation of T:m on an S object is equivalent to an 
invocation of S:m. Thus, all properties of S:m carry over to T:m as long as T:m 
is applied to objects of type S: 
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class-rule: 

.4 |> { r(this) = T A P } impl {T,m) { Q } 
M |> { r(this) ^ T A P } T:m { Q } 

M |> { r(this) ^ T A P } T:m { Q } 



subtype-rule: 

S^T 

A |> { r(this) ^ S A P } S:m { Q } 
A |> { r(this) ^ S A P } T:m { Q } 



The subtype-rule enables one to prove an annotation for a particular subtype S. 
To prove the sequent Al [> { r(this) A T A P }T:m{ Q } let us first assume that 
the given program is not open to further extensions, i.e., all subtypes Si, . . . , Sfc 
of T are known. Based on a complete axiomatization of the (finite) subtype 
relation, we can derive A T -So ^ Si V . . . V ^ Sfc. Thus, we can prove 
the sequent by applying the subtype-rule for all subtypes of T and by using the 
disjunct-rule (see Fig. 1) and strengthening with the above equivalence. 

Usually, object-oriented programs are open to extensions; i.e., they are de- 
signed to be used as parts of bigger programs containing additional subtypes. 
Typical examples of such open programs are libraries. Intuitively, open 00- 
programs are more difficult to verify because extensions can influence the be- 
havior of virtual methods. To handle open programs, the proof obligations for 
later added subtypes are collected. When a subtype is added, the corresponding 
obligations have to be shown. A detailed discussion of this topic and a technique 
how such obligations can be treated as assumptions are given in [PHM98] . 



Language-Independent Axioms and Rules. Besides the axiomatic seman- 
tics, the programming logic for Java contains language-independent axioms and 
rules to handle assumptions and to establish a connection between the predicate 
logic of pre- and postconditions and triples of the programming logic (cf. Fig. 1). 



4 Towards Formal Soundness Proofs for Complex 
Programming Logics 

The last sections presented two definitions of the semantics of Java-K. The ad- 
vantage of the operational semantics is that its rules can be used to generate 
interpreters for validating and testing the language definition (cf. [BCD+89]). 
The axiomatic definition can be considered as a higher-level semantics and is 
better suited for verification of program properties. Its soundness should be 
proved w.r.t. the operational semantics. 

Since such soundness proofs can be quite long for full-size programming lan- 
guages, it is desirable to enable mechanical proof checking (cf. [vON98] for the 
corresponding argumentation about type safety proofs) . That is why we built on 
the techniques developed by Gordon in [Gor89]: Both semantics are embedded 
into a higher-order logic in which the axioms and rules of the axiomatic semantics 
are derived from those of the operational semantics. The application of Gordon’s 
technique to Java-K made extensions necessary: a systematic treatment of SOS 
rules, and handling of virtual methods and recursion. 
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while-rule: 

^ |> { e — h{true) A P } stm { P } 

[> { P } while (e) { stm } { e — b{false) A P } 

if-rule: 

.4 t> { e — b(true) A P } stml { Q } 

^ |> { e — b(false) A P } stm2 { Q } 

^ [> { P } if (e) { stml } else { stm2 } { Q } 

eall-var-rule: 

^ [> { P } x=y.T@m(e); { Q } 

A [> { P[w/Z] } x=y.T(am(e); { Q[w/Z] } 

where x and w are distinct program 
variables and Z is an arbitrary logical 
variable. 

false-axiom: 

|> { FALSE } comp { FALSE } 

assumpt-intro-rule: 

A\> A 

Ao , A \> A 

eonjunet-rule: 

A [> { Pi } comp { Qi } 

A [> { P 2 } comp { Q 2 } 

A [> { Pi A P 2 } comp { Qi A Q 2 } 

strength-rule: 

P' ^ P 

A 1> { P } comp { Q } 

A [> { P' } comp { Q } 

inv-rule: 

A [> { P } comp { Q } 

A [> { P A R } comp { Q A R } 

where R is a X'-formula, i.e. doesn’t 
contain program variables or $. 

all-rule: 

A [> { P[y/Z] } comp { Q } 

A [> { P[Y/Z] } comp { VZ : Q } 

where Z, Y are arbitrary, but 
distinct logical variables. 



seq-rule: 

A 1> { P } stml { Q } 

A |> { Q } stm2 { R } 

A |> { P } stml stm2 { R } 

static-invoc-var-rule: 

A |> { P } x=T.m(e); { Q } 

A |> { P[w/Z] } x=T.m(e); { Q[w/Z] } 

where x and w are distinct program 
variables and Z is an arbitrary logical 
variable. 

assumpt- axiom: 

A |> A 

assumpt- elim-rule: 

A [> Ao 
Ao . A |> A 
A [> A 

disjunct-rule: 

A 1> { Pi } comp { Qi } 

A 1> { P 2 } comp { Q 2 } 

A 1> { Pi V P 2 } comp { Qj V Q 2 } 

weak-rule: 

A 1> { P } comp { Q } 

Q ^ Q' 

A 1> { P } comp { Q' } 

subst-rule: 

A |> { P } comp { Q } 

A 1> { P[t/Z] } comp { Q[t/Z] } 

where Z is an arbitrary logical 
variable and t a S-term. 

ex-rule: 

A |> { P } comp { Q[y/Z] } 

A |> { 3Z : P } comp { Q[y/Z] } 

where Z, Y are arbitrary, but 
distinct logical variables. 



Fig. 1. Additional axioms and rules 
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This section outlines the translation of SOS rules into higher-order formulas; 
it embeds the programming logic of Java-K into higher-order logic and relates it 
to the operational semantics. Furthermore, it presents the soundness proof for 
the most interesting rules of the programming logic. 



SOS rules in HOL. The SOS rules can be directly translated into a recursive 
predicate definition of the form: 

sem{Si, stm, St) <^def \J (stm matches stmpattern{R) A antecedents(R) ) 
ReSOS-rules 

where stmpaUern{R) is the statement pattern occurring in the succedent of R 
and antecedents {R) denotes the antecedents of the rule where free occurrences 
of logical variables are existentially bound. E.g., the SOS rule for virtual method 
invocation is transformed to: 

stm matches (x = y.T:m(e);) A 3S' : Si{y) 7 ^ null A T"(S'i(y)) ^ T 
A sem{initS[this := Si{y), p := e{Si,e), $ := Si($)], body{impl{T{Si{y)), m)), S') 

ASt = Si[x := S"(result), $ := S'{$)] 

The semantics of Java-K is given by the least fixpoint of the defining equivalence 
for sem. To simplify inductive proofs and the embedding of sequents into the 
semantics framework, we introduce an auxiliary semantics predicate nsem with 
an additional parameter of sort Nat: 

nsem{N, Si, stm, St) <^def \J (stm matches stmpattern{R) A antecedents„aem{R) ) 
ReSOS-rules 

where antecedents nsem{R) is obtained from antecedents (R) by substituting all 
occurrences of sem{S, stm, S") by fV > 0 A nsem{N — 1, S, stm, S'). It is easy to 
show that nsem is monotonous w.r.t. N, i.e., nsem{N, S'j,stm, St) nsem{N + 
Si, stm, St). The following lemma relates sem and nsem: 
sem{Si, stm, St) 3N : nsem{N,Si, stm, St) 

Semantics for Triples and Sequents. To embed triples into HOL, we con- 
sider the pre- and postconditions as predicates on states, i.e., as functions from 
State to Boolean. A triple of the form { P } comp { Q } is viewed as an abbrevi- 
ation for H{XS.P*, comp ,XS.Q*) where P* and Q* are obtained from P and 
Q by substituting all occurrences of program variables v, parameters p, and the 
constant symbol $ by 5(v), S'(p), and S{$) (for simplicity, we assume here that 
S does not occur in P or Q). Based on this syntactical embedding, we can define 
the semantics of triples in terms of sem: 

H{P, stm, Q) ^ VS, S' : P{S) A sem{S, stm, S') ^ Q(S') 

R(P,T@m,Q) R( XS. S(this) yl: null A P(S), bodp(T@m), Q) 

H(P,To:m,Q) H( XS. T(S(this)) = T A P(S), impl(T,m), Q) 

The first equivalence formulates the usual meaning of Hoare triples for state- 
ments (cf. [Gor89]). The second defines implementation annotations in terms of 
the method body. The third expresses the concept of virtual methods: A vir- 
tual method abstracts the properties of all corresponding implementations. The 
conjunct ranges over all class types T that are subtypes of Tq. 
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The most interesting aspect of the embedding is the treatment of sequents and 
rules. Sequents cannot be directly translated into implications with assumptions 
as premises and consequents as conclusions. For the implementation-rule, this 
translation would lead to the following incorrect rule: 

A A H{P,T@m,Q) ^ H{ XS. S(this) 7 ^ null A P{S), body{T:@m), Q) 

A ^ H{P,T@m,Q) 

Using the second equivalence, we can show that the antecedent is a tautology. 
Since A can be empty, the rule would allow one to prove that implementations 
satisfy arbitrary properties. The implementation-rule implicitly contains an in- 
ductive argument that has to be made explicit in the embedding. This is done 
using a predicate K that is related to nsem just as H is related to sem: 

K{N, P, stm, Q) ^ VS, S' : P{S) A nsem{N, S, stm, S') ^ Q(S') 

K(0,P,T@m,Q) true 

K(N + l,P,T@m,Q) K{N, XS. S{this) ^ null A P{S), body{T@m), Q) 
K{N,P,To.m,Q) XS.T{S{this)) = T A P{S), impl{T,m), Q) 

T Tq 

Using the lemma that relates sem and nsem, it is easy to show that 
H{P, comp ,Q) ~iN : K{N,P, comp , Q) 

Based on K, sequents can be directly embedded into HOT. A sequent of the form 
{Pi}mi{Q^},...,{P/}m/{Q;} \> {P} comp {Q} is considered as abbreviation for 

VA: { K{N,Pi,mi,QA A . . . A K{N,Pi,mi,Qi) ^ K{N,P, comp , Q) ) 

Because of the relation between H and K, a sequent without assumptions is 
equivalent to the semantics of triples described by H. The complexity of the 
embedding is a strong argument for using Hoare rules in practical verification 
instead of the axiomatization of the operational semantics. Many of the proof 
steps encapsulated in the soundness proof have to be done again and again when 
verification is directly based on the rules of the operational semantics. 



Soundness of the Programming Logic. In the last paragraph, we formal- 
ized the semantics of triples and sequents in terms of sem and nsem. Having a 
semantics for the sequents, we can prove the soundness of the Java-K logic. The 
embedding into HOL was chosen in such a way that the soundness proof can be 
done separately for each logical rule. We illustrate the needed proof techniques 
by showing the soundness of the implementation- and the invocation-rule. In the 
proofs, we abbreviate S'(x) yf null by i^(x). 

implementation-rule. The soundness proof of the implementation-rule illustrates 
the implicit inductive argument of that rule and demonstrates the treatment of 
assumptions. We show: 

(VM : A{M) A K{M, P, T@m, Q) ^ K{M, AS'.:/(this) A P{S), body{T@m), Q)) 

=> VA : A{N) ^ K{N,P,T@m,Q) 

where A{L) denotes the conjunction of the embedded assumptions and P and Q 
abbreviate XS.P* and XS.Q* , respectively. The proof runs by induction on N: 
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Induction base for N = 0: K{0, P, T@m, Q) is true by definition of K. 
Induction step: Assuming that the hypothesis holds for N. 

(VM : A{M) A K{M, P, T@m, Q) ^ K{M, AS'.rz(this) A P{S), body{T@m), Q)) 

=> [Conjoining the induction hypothesis] 

(VM : A{M) A K{M, P, T@m, Q) ^ K{M, AS'.rz(this) A P{S), body{T@m), Q)) 

A [A{N) ^ A(A,P,T@m,Q)) 

^ [Instantiate M by A & propositional logic] 

A{N) ^ K{N,\S.v{ms)hP{S),body{T:@m),Q) 

=> [Definition of K] 

A{N) => A(A+ l,P,T@m,Q) 

=> [A{N + 1) A(A), see below] 

A{N + 1) ^ K{N +l,P,T@m,Q) 

The implication A(fV + 1) A{N) follows from the definition of K and the 
monotonicity of nsem. 

invocation-rule. The soundness proof of the invocation-rule demonstrates how 
substitution is handled and why the restrictions on the signatures of pre- and 
postcondition formulas are necessary. We simplify the proof a bit by leaving out 
the assumptions in the antecedent and succedent of the rule. The extension to 
the complete proof is straightforward. Thus, we have to show: 

VM : A(M,AS'.P*,T:m,AS'.Q*) 

=> VA : K{N,XS.iy{y) A (P[y /this, e/p])* , x=y.T:m(e);, AS'.(Q[x/result])*) 

Assuming the premise, we prove the conclusion, i.e., for arbitrary N, S, S": 
v{y) A (P[y /this, e/p])* A nsem( A, S', x=y.T:m(e);, S") ^ (AS.(Q[x/result])*)(S") 
This is proved by case distinction on N: 

Case A = 0: From the definition of nsem we get that the premise is false. 

Case A > 0: The following lemma relates substitution and state update: 

(P[ti/a;i, . . . ,t„/®„])* = (AS.P*)(S[a;i := e(S,ti), e(S,t„)]) 

By this lemma and with P for AS.P* and Q for XS.Q*, the proof goal becomes: 

iy(y) A P(S[this := S(y), p := e(S, e)]) A nsem{N, S, x=y.T:m(e);, S") 

=> Q(S" [result := S"(x)[) 

To show this, we will use the following implication (-I-): 

n(y) A P(cr) A T(S(y)) ^ T A nsem{N — 1, a, body{impl{T{S{y)), m)). S') => Q{S') 

where a abbreviates zm<S[this := S'/y/jP := e(S, e),$ := S'/S)]. The proof of 
(-I-) uses the general proof assumption VM : K{M, P,T:m,Q), the definition of 
K, and the fact that r(S(y)) is a class type. r(S'(y)) is a class type, because 
the context conditions of Java/Java-K imply that y is of a reference type and 
because Java and thus Java-K are type-safe. In addition to (-I-), we need the fact 
that for any state 5'o the value of P{So) only depends on S'o(this), S'o(p), and 
because other variables are not allowed within P (cf. Sect. 3), i.e., 

So(this) = Si (this) A So(p) = Si(p) A So($) = Si($) ^ P{So) = P(Si) 
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Similarly, Q only depends on (result) and S'o($). By this, the remaining goal 
can be proved as follows: 

iy(y) A P(S[this := S{y), p := e{S, e)]) A nsem{N, S, x=y.T:m(e);, S”) 

=> [Definition of nsem (disjunct for “x=y.T:m(e);”); cf. paragraph “SOS in HOL”] 

v{y) A P(S'[this := S{y), p := e{S, e)]) A > 0 A 3S' : iy{y) A r(S(y)) ^ T 
A nsem{N — l,cr, body{impl{T{S{y)),m)),S') A S” = S[x := S'(result),$ := ^^(S)] 
=> [case assumption > 0; general logic) 

3S' : S” = S[x := S' (result), $ := S'($)[ 

A iy{y) A P(S[this := S(y), p := e(S, e)[) A r(S(y)) ^ T 
A nsem{N — 1,ct, body{impl{T{S{y)),m)),S') 

=> [ P(S[this := S(y),p := e(S, e)[) = P(cr); lemma (+)) 

3S' : S" = S[x := S' (result), $ := S'($)[ A Q(S') 

=> [ S' (result) = S"(x) = S" [result := S"(x)[ (result), S'($) = S" [result := S"(x))($)[ 
3S' : Q(S" [result := S"(x)[) 

=> 

Q(S" [result := S"(x)[) 



5 Conclusions 

We introduced the sequential Java subset Java-K, which provides the typical 
00-language features such as classes and interfaces, subtyping, inheritance, dy- 
namic dispatch, and encapsulation. Based on a formalization of object stores as 
first-order values, we presented a Hoare-style programming logic for Java-K. A 
central concept of this logic is the notion of virtual methods to handle overriding 
and dynamic dispatch. Virtual methods represent the common properties of all 
corresponding subtype methods. We showed how virtual methods and method 
implementations can be used to cover statically and dynamically bound method 
invocations, subtyping, and inheritance in programming logics. 

The logic has been proved sound w.r.t. an SOS semantics of Java-K. Fol- 
lowing the ideas of [Gor89], we embedded both semantics into a higher-order 
logic and derived the axioms and rules of the programming logic from those of 
the operational semantics. We presented the proofs for two typical rules. This 
technique for soundness proofs provides a good basis for applying proof checkers. 
Mechanical checking of the soundness proof is considered further work. 
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Appendix 

These are the SOS rules for static method invocations, constructor calls, sequen- 
tial statement composition, cast, while, and if statements: 

initS[p e(S,e),$ S($)] : 6ocZy(T@m) — ?■ S' 

S : X— T.m(e); — * S[x S^(result), $ S^($)] 

true 

S : X— new T(); — ?■ S[x new(S($),T), $ S($)(T)] 

S : stml — * s' : stm2 — * S" r(e(S, e)) ^ T 

S : stml stm2 — *■ S" S : x— (T)e; — > S[x e(S, e)] 

e(S,e) — h{true), S : stm — * S' , S' : while(e){stm} — ?■ S" e(S,e) — b(false) 

S : while(e){stm} — *■ S" S : while(e){stm} — ?■ S 

e{S,e) — b{true), S : stml — * S' e{S,e) — b{false), S : stm2 — * S' 

S : if(e){stml} else{stm2} — >■ S' S : if(e){stml} else{ stm2 } — > 
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Abstract. This paper presents the first approximation method of the 
finite-failnre set of a logic program by set-based analysis. In a dnal view, 
the method yields a type analysis for programs with ongoing behaviors 
(perpetnal processes). Our technical contributions are (1) the semantical 
characterization of finite failure of logic programs over infinite trees and 
(2) the design and soundness proof of the first set-based analysis of logic 
programs with the greatest-model semantics. Finally, we exhibit the con- 
nection between finite failure and the inevitability of the ‘inconsistent- 
store’ error in fair executions of concurrent constraint programs where no 
process suspends forever. This indicates a potential application to error 
diagnosis for concurrent constraint programs 

Keywords: abstract interpretation, set-based program analysis, types, 
logic programs, concurrent constraint programs, finite failure, fairness 



1 Introduction 

Set-based program analysis dates back to Reynolds [35] and Jones and Much- 
nick [27] and forms a well-established research topic by now (see [1,24,34] for 
overviews and further references). It has direct practical applications to type 
inference, optimization and verification of imperative, functional, logic and, as 
we will see in this paper, also concurrent programs. 

In set-based analysis, the problem of reasoning about runtime properties of 
programs is transferred to the problem of solving set constraints. The design of 
a specific analysis involves two steps: (1) define a mapping from a class of pro- 
grams P to set constraints ipp and show the soundness of the abstraction of P 
by a distinguished solution of (fp, and (2) single out a corresponding subclass of 
set constraints and devise an efficient algorithm for computing the distinguished 
solution. For instance, Heintze and Jaffar defined a set-based analysis for logic 
programs with the least model semantics in [22]. Their analysis is an approxi- 
mation method for the success set of a logic program, i.e. for the set of initial 
queries for which a successfully terminating execution exists. 

* On leave from University of Wroclaw, Poland. Partially supported by Polish KBN 
grant 8T11G02913. 
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In this paper, we consider the finite failure set of a logic program, i.e. the set 
of initial queries for which all fair executions terminate with failure. In order to 
give a sound prediction of finite failure (‘if predicted, it will occur’), we need a 
characterization of finite failure in terms of program semantics. Classical results 
from logic programming, however, only yield the converse, i.e. a characterization 
of the greatest-model semantics in terms of finite failure (see Remark 1). Fortu- 
nately, for programs over the domain of infinite trees we can characterize finite 
failure through the greatest-model semantics (more precisely, its complement; 
see Theorem 1). Since the analysis we design computes an abstraction of that 
semantics, we obtain an approximation method for the finite-failure set of a logic 
program over infinite trees (see Theorem 3). More precisely, the emptiness of the 
computed abstract value for the predicate p indicates the finite failure of every 
predicate call p{x). At the same time, this method can predict finite failure of a 
logic program over rational trees, or over finite trees (see Remarks 3 and 5) . 

In the least-model analysis in [22], Heintze and Jaffar use definite set con- 
straints; they give a corresponding constraint solving algorithm in [21] (see [9] for 
further results). Our analysis uses co-definite set constraints, which bear their 
name in duality to definite set constraints due to the fact that every satisfiable 
constraint in this class has a greatest solution. This fact is crucial for our analy- 
sis. Algorithms for solving co-definite set constraints are given in [4,16]. In this 
paper, we focus on the definition of the analysis and the soundness of the abstrac- 
tion, which is: the greatest solution of the co-definite set constraint pp inferred 
from the program P is a safe approximation of the greatest-model semantics 
for P (see Theorem 2). 

In a different reading, our abstraction method is a type analysis of logic pro- 
grams with ongoing behavior. Such programs are investigated under the denom- 
ination perpetual processes in [28]. There, the semantics of such a program P is 
defined by the greatest-fixpoint semantics over the domain of infinite trees. Our 
analysis computes the abstraction of this semantics in the form of the greatest 
solution of the inferred co-definite set constraint (the greatest-fixpoint semantics 
is equal to the greatest model of P’s completion [10]). This solution assings to 
every program variable x a set of infinite trees that can be viewed as the type 
of X. This type describes a safe approximation (i.e. a superset) of the set of all 
possible runtime values for x in ongoing program executions. 

Finally, we consider a potential application to concurrent constraint programs 
(see e.g. [36,37]). We carry over the approximation method of the greatest model 
to cc programs. This yields a type analysis for cc programs in the same sense as 
above. It also yields a failure analysis. In cc programs, an inconsistent constraint 
store (viz., failure) is considered a runtime error. (This is in contrast to logic 
programming where failure is part of the backtracking mechanism.) Our analysis 
computes an approximation of the execution states of cc programs for which 
failure is inevitable in fair executions unless a process (i.e. a predicate call) 
suspends forever (see Theorem 4). The global suspension of a process is not 
necessarily a programming error. That a process must suspend forever in order 
to avoid a runtime error is, however, a problem worth diagnosing and reporting. 
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Related Work. To our knowledge, set-based analysis for logic programming (see 
e.g. [5,18,13,14,22,23,30]) has previously only been designed to approximate the 
success set (which can be characterized by the least model semantics). Mishra’s 
analysis [30] is often cited as the historically first one here. Heintze and Jaffar [23] 
have shown that Mishra’s analysis is less accurate than theirs in two ways, due 
to the choice of the greatest solution for the class of set constraints he considers 
(see Remark 4) and due to the choice of the non-standard interpretation of non- 
empty path-closed sets of finite trees, respectively. Using the techniques in this 
paper, we are able to show that Mishra’s approximation is so weak that it even 
approximates the greatest model. Mishra proves that ‘p{x) will never succeed’ 
if the set constraint ipp he derives is unsatisfiable. Our results yield that ^p{x) 
will finitely fail’ if ipp is unsatisfiable over the domain of non-empty path-closed 
sets of infinite trees (see Remark 6). 

Regarding the analysis of concurrent constraint programs, various techniques 
based on abstract interpretation have been used (see e.g. [17]) but none that is 
related to set-based analysis. A first formal calculus for (partial) correctness of cc 
programs is developed in [15]. The proof methods there are more powerful than 
ours but not automatic. The necessity to consider greatest-fixed point semantics 
for the analysis of reactive systems has been observed by other authors and in 
the context of different programming paradigms (see e.g. [11,19]). None of these 
analyses is set-based. 

Finally, we want to mention that the idea to derive necessary conditions for 
the inevitability of a runtime error by static analysis stems from the work of 
Bourdoncle [3] on abstract debugging. 

2 Logic Programs 

Preliminaries. We assume a ranked alphabet S fixing the arity n > 0 of its 
function symbols f,g,... and constant symbols a,b, . . ., and an infinite set Var 
of variables x, y, z, . . .. We write x for finite sequences of variables, and use 
analogous sequence notation for other syntactic entities. We also write f{x) for 
flat terms, where we assume implicitly that the arity of / equals the length of x. 
A term without variables is called a ground term. The set of infinite trees over S 
is denoted by T“. Note that an infinite tree can have finite paths (ending with 
a constant symbol); a finite tree is a special case. The set of terms over E and 
Var is denoted by T“(Var). For an arbitrary formula <P, we write 3-xd^ for the 
existential closure of <P with respect to all variables in <P but x. We also assume 
a set Pred of predicate symbols. The Herbrand Base B is the set of all ground 
atoms over Pred and T“, i.e., B = {p{t) \ p G Pred,t G 

Logic Programs. A logic program defines predicates through clauses of the 
form 

P{t) ^ Pl{ti),...,Pn{tn) 

^ What we call Herbrand Base is sometimes called Complete Herbrand Base [28] in 
order to distinguish it from the classical notion for finite trees. 
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where p(t) is called the head and pi(ti), . . . is called the body of the 

clause. A clause with an empty body is called a fact. A complete program has 
the form 



Tip 

A A p{ti) /\pij{tij)- 

pGPred2=l j — 1 

where i ranges over the number Up of clauses in the definition of predicate p, 
and j ranges over the number rii^p of queries in the clause of predicate p. For 
better readability, we assume that all predicates are unary; the results can easily 
be extended to the case without this restriction (for example, by requiring the 
signature to contain at least one binary function symbol). 

If we consider the logical semantics of a program of the form above, we take 
the completion of P [10], which is given by the following formula. 



Tip Tli^p 

compl{P) = A T>{x) ^ \J 3_a;( X = ti A f \ PijiUj) )■ 

p^Pred i—1 j — 1 

A query s is a conjunction /\j^Pk{tk) where the tk are terms. We here allow 
infinite terms like f{x, f{x, . . .)) in order to model execution states with cyclic 
unifiers such y f{x,y). Such terms can be finitely represented by equations, 
e.g. y = f{x, y), or by syntact annotations as in [2]. 

A ground query is a query /\^Pk{tk) such that all tk are ground (i.e. with- 
out variables). We use the predicate constant true as the neutral element for 
conjunction: i.e., s = sA true. In particular, the ‘empty’ query is written as true. 

An interpretation p (sometimes called a model) is a subset of the Herbrand 
Base, p C B. Interpretations are ordered by subset inclusion. 

We identify an interpretation p C B with a valuation p : Pred ^ 2^^, 
i.e. a mapping of predicate symbols to sets of trees such that 

p{p) = {teT^ \ p{t) G p}. 

A model of the program P is a valuation p : Pred ^ 2^^ such that the formula 
compl{P) is valid in the usual logical sense. 

The greatest model of compl(P), denoted by gm{P), always exists. Using our 
convention of identifying the interpretation gm{P) with a valuation, we use the 
notation gm{P){p) for the denotation of the predicate p by the greatest-model 
semantics, i.e. 



gm{P){p) = {t&T^ \ p{t) G gm{P)}. 
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Operational Semantics. The logic program P defines a fair transition sys- 
tem Tp = (5, Tp). The fairness of the transition system is defined by the fairness 
of the non-deterministic selection rule (in the classical sense [28] : a selection rule 
is fair if every query atom in a state s gets selected eventually, in every execution 
starting in s). The non-determinism of the selection rule means that conjunction 
corresponds to parallel composition with the interleaving semantics; disjunction 
corresponds to non-deterministic choice. 

The set S of states of the transition system Tp consists of all queries (includ- 
ing true) and the failure state false, 

S = {/\Pk{tk) I VA: pk^ Pred, tk € T^(Var)} U {false} 

k 

The transition relation rp C 5 x 5 is defined according to the standard rewriting 
semantics under a fair selection rule. When a selected query atom p(t) in a state 
s G 5 of the form s = Srest A p{t) unifies with the head of a clause p{U) ^ 
/\iPij{tij), then the state s' obtained as the instantiation of Srest A /\iPij{tij) 
under the most general unifier of t and ti is a possible successor state of s. We 
say that p{t) is applied in the transition step from s to s' . When a selected query 
atom pff) does not unify with any of the heads of the clauses of p, then the 
successor state is false. 

Similarly, P defines a fair ground transition system Tp = {S, Tp). We obtain 
the transition relation Tp by modifying the one of Tp: after every transition step 
of Tp, all variables in the successor state are instantiated with ground terms 
(i.e. infinite trees). Note that ground queries are a special case of queries. 

We say that a derivation finitely fails if if it ends in the state false. A 
query p(x) is finitely failed (and belongs to the set FF) if every Tp derivation 
starting with query p(x) finitely fails. 

FF = {p{x) I p(a;) is finitely failed} 

Similarly, a ground query p{t) is called ground finitely failed (and belongs to the 
set GFF) if every Tp derivation starting from p{t) finitely fails. 

GFF = {p{t) I p{t) is ground finitely failed} 

We will now characterize the finite failure set of a program P over the domain 
of infinite trees through the greatest model of compl(P). Since we have not found 
this observation in the literature, we will give its proof, drawing from several 
results that are classical in the theory of logic programming. 

Theorem 1 (Characterization of finite failnre over infinite trees). 

Given a logic program P over infinite trees, the query p{x) is finitely failed 
if and only if the value of p in the greatest model of compl(P) over the domain 
T“ of infinite trees is the empty set; i.e.. 



p{x)gFF{P) if and only if gm{P){p) = %. 
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Proof. The only-if direction is a classical result (namely, the ‘algebraic soundness 
of finite failure’, see [28,25]). 

For the other direction, first note that equations over infinite trees have the 
saturation property, that is, an infinite set of constraints is satisfiable if every of 
its finite subsets is [28,26,33]. 

Now assume that p{x) ^ FF{P). Since (see [28,26]) 

gm{P){p) = {t I p{t) ^ GFF{P)}, 

it is sufficient to show that there exists an infinite tree t such that p{t) ^ GFF{P) 
(i.e., p{t) is not in the ground finite failure set; note that in general, the ground 
finite failure of a call does not imply finite failure of some ground instance of 
this call.) 

By assumption, there exists an execution starting in the state p{x) that 
does not lead to the failure state. That is, there exists a transition sequence 
S 07 Si 7 S 2 j-- - starting in sq = p{x) such that the constraint store Lpi of every 
state Si is satisfiable (in the terminology of constraint logic programming [25] , a 
state /\kPk(tk) is written as the pair {/\kPk{xk), p) where the constraint store (p 
is a conjunction of equations that is equivalent to /\j, Xk = tk over the domain 
of infinite trees). Since ipi is stronger than ipi-i for i > 1, is equivalent to 

A n 

i=0 Pi- 

Thus, we have a sequence of constraints ipo, ipi, ip 2 , ■ . ■ such that Ar=o Pi 
satisfiable for all n. The saturation property yields that also the infinite conjunc- 
tion satisfiable. Let a be a solution of Ai>o‘^*- Then the transition 

sequence s(,, s), S 27 ■ ■ • that we obtain by instantiating the states Si by the valua- 
tion a is a ground transition sequence that does not lead to the fail state. Hence, 
if a{x) = t, then p{t) ^ GFF{P) and GFF{P){p) is nonempty. □ 

Remark 1. Palmgren [33] has shown that a constraint logic program over a con- 
straint domain with the saturation property is canonical. That is, gfp(Tp) = 
Tpi"^ (where holds; for the definition of Tp see Section 4.) 

Since gfp{Tp) = B\GFF{P) holds for canonical programs (see [25]), this is 
sufficient to characterize ground finite failure over infinite trees. Canonicity is 
not sufficient for finite failure of non-ground queries. 

For example, consider the program p{f{x)) ^ p{x) over the structure 

of finite trees. This program is canonical (over finite trees). Its greatest model 
over finite trees assigns p the empty set (in accordance with the fact that p{f) e 
GFF{P) for all finite trees t), but p{x) is not finitely failed. 

Similarly, Jaffar and Stuckey [26] have shown that for programs over infinite 
trees, Tp I oj equals the complement of [FF{P)], where [FF{P)] is the set of 
ground instances of elements of FF(P). This is a characterization of the denota- 
tional semantics through the operational semantics; our characterization is the 
converse. 



Remark 2. The statement of Theorem 1 holds for constraint logic programs over 
every constraint system with the saturation property (‘an infinite set of con- 
straints is satisfiable if every of its finite subsets is’). 
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Remark 3. Since the structure of rational trees and the structure of infinite 
trees are elementarily equivalent [29] (in particular, the test of satisfiability of 
constraints is the same), we can take the operational semantics of programs 
over rational trees in Theorem 1 (but we must consider the logical semantics 
over infinite trees; note that rational tree constraints do not have the saturation 
property). The modified statement is: 

Given a logic program P over rational trees, the query p{x) is finitely failed if 
and only if the value of p in the greatest model of eompl{P) over the domain 
o/ infinite trees is the empty set. 

3 Co-definite Set Constraints 

Syntax. A (general) set expression e is built from first-order terms, union, in- 
tersection, complement, and the projection operator [21]: 

e ::= a; | /(e) | e U e' | e n e' | | /(^)(e) 

The projection (e) is only defined if fc is a positive integer smaller than the 
arity of /. If e does not contain the complement operator, then e is called a 
positive set expression. A (general) set constraint is a conjunction of inclusions 
of the form e C eb 

A definite set constraint [21] is a conjunction of inclusions e/ C between 
positive set expressions, where the set expressions Cr on the right hand side of 
C are furthermore restricted to contain only variables, constants and function 
symbols and the intersection operator (i.e., no projection or union). 

Definition 1. A co-definite set constraint is a conjunction of inclusions e/ C 
6r between positive set expressions, where the set expressions e/ on the left- 
hand side of C are further restricted to contain only variables, constants, unary 
function symbols and the union operator (that is, no projection, intersection or 
terms with a function symbol of arity greater than one) . 

ei ::= a; | a | /(e) ::= x \ /(e) | e U e' | e n e' | /(^)(e) 

Semantics. We interpret set constraints over 2^^ , the domain of sets of trees 
over the signature S. That is, variables denote sets of trees, and a (set) valuation 
is a mapping a : Var ^ 2 ^v . Tree constructors are interpreted as functions over 
sets of trees: the constant a is interpreted as {a}, and the function symbol / is 
interpreted as the function which maps sets Si, Sn into the set 

{/(^Ij ■ ■ • : In) \ ti G Si, . . . ,tn G iS'n} . 

The application of the projection operator for a function symbol / and the fc-th 
argument position on a set S of trees is defined by 

= {t \^ti, . . .tn ■■ tk = t, f{ti, . . .,tk,. . .,tn) e S} . 

The set operators union U and intersection n, as well as inclusion C are in- 
terpreted as usual. Define the union of set valuations (J^ ai on variables as the 
pointwise union on the images of all variables; i.e., (UiCKi)(^) = 
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The following properties hold for co-definite set constraints (see also [4]). 
These properties are essential for our proof in the following section to work, 
which shows soundness of abstraction. 

Proposition 1 (Properties of co-definite set constraints). 

1. Solutions of co-definite set constraints are closed under arbitrary unions. 
That is, the valuation (J^ Ui is a solution if the valuations Oj, i G I, are. 

2. If satisfiable, every co-definite set constraint ip has a greatest solution, 
noted gSol{(p). 

3. Every co-definite set constraint without inclusions of the form a C x is 
satisfiable. 

Proof. The first claim is proved by case-distinction over the possible set inclu- 
sions. The second is an immediate corollary from the first one. (Note that the 
restriction to constants and monadic function symbols on the left hand side of an 
inclusion is crucial here. For instance, the set constraint f{x, y) C /(a, a)U/(5, b) 
does not have a greatest solution; it has two maximal but incomparable ones.) 
In order to verify the third claim notice that the valuation which maps every 
variable into the empty set is a solution of co-definite set constraints without 
inclusions of the form a C e. □ 

Remark 4- Mishra [30] uses a class of set constraints with a non-standard inter- 
pretation over non-empty path-closed sets of finite trees to approximate the suc- 
cess set of a logic program. (A set of trees is path-closed if it can be recognized by 
a deterministic top-down tree automaton [20].) Set constraints over non-empty 
path-closed sets also have the properties 1. and 2. above. Due to the non-standard 
interpretation, this holds even if n-ary constructor terms are allowed on the left 
side of the inclusion. For example, the constraint f{x, y) C /(a, a) U f{b, b) has 
a greatest solution over path-closed sets (which assigns both variables x and y 
the set {a, b}). 

4 Set-based Analysis 

We will next describe the inference of a co-definite set constraint ipp from a 
logic program P. The intuition is as follows. A clause of the formp(ti) ^ Pj{tij) 
can be written equivalently as p{xi) ^ Xi = ti A Uj = Xij A p{xij). Following the 
abstract interpretation framework, we abstract the semantics-defining fixpoint 
operator Tp by replacing the constraint Xi = U A tij = Xij in its definition by the 
co-definite set constraint Xi C ti A d>{tij C Xij); the operator <I> is defined below. 
The fixpoint equation for the abstract operator Tp is essentially the inferred set 
constraint pp. The soundness of the abstraction follows directly. The schema 
of our method (whose ingredients are Propositions 1 and Lemma 1 below) is 
described in an abstract setting in [12]. 

We next introduce the operator <P that assigns an inclusion of the form t C x a, 
co-definite set constraint. For example, d>{f{x, y) C /(a, a)U/(5, b)) is essentially 
the conjunction of a; C a) U f{b, b)) and y C a) U f{b, b)) which 

is equivalent to the conjunction of a; C a U 5 and y C aUb. 
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We introduce a fresh variable Zt for each subterm t appearing in the formula 
and then define the constraint C x) for a term t and a variable x by induction 
on the depth of t. 

^{y C a;) = y C X 

/ Zt C X A Zt^ C f^^^izt) A CztJ \ 

^(tCa;)= ... for t = /(ti, . . .,t„) 

V A zt„ C /-I (zt) A CztjJ 



Lemma 1. If a tree valuation a : Var ^ satisfies the equality x = t, then 
the set valuation aa ■ Var — > 2^^ defined by adx) = {a(x)} satisfies the co- 
definite set constraints x C t and <P{t C x). □ 

We define the co-definite constraint ifp inferred from P as follows. Here, we 
assume that the different clauses are renamed apart (if not, we apply a-renaming 
to quantified variables). 



TLp Up rii^p 

‘PP = f\ 

p^Pred i ^ i 

Both, symbols p G Pred and x G Var act here as second-order variables ranging 
over sets of trees. In the following, when we compare an interpretation p of a logic 
program with a valuation <t of a set constraint, p C a means that p{p) C a{p) 
for all p G Pred. 

Theorem 2 (Soundness of Abstraction). 

For a logic program P, the greatest model of P’s completion is smaller than the 
greatest solution of pp, formally gm{P) C gSol{(fp). 

Proof. We first define an abstraction Tp of the Tp operator, and we prove 
that gfp{Tp) C gfp{Tp), using Lemma 1. In the second part we show that 
yfp{Tp) C gSol{ipp), using here Proposition I. 



1- gfp{Tp) C gfp{Tp). The Tp operator maps an interpretation p to another 
one Tp{p) where, for all p G Pred, 



Tp{p){p) = 

As usual, we write M.,a\= F ii the formula F is valid under the interpretation 
with the valuation a on the structure (with the domain) M . . The greatest-model 
semantics and the greatest-fixpoint semantics of a program P coincide; i.e., the 
greatest model of P’s completion is the greatest fixpoint of the operator Tp, 
formally gm{P) = gfp{Tp) (see e.g. [28]). 



3of : Var ^ Ti; 3i : t = a{ti), 1 
j cr H Aj ^ P(Pij) J 
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The Tp operator maps an interpretation p to the interpretation Tp (p) where, 
for all p G Pred, 



3(7 : Var ^ 2^^, 3z : t G 

2 ^ ^ f\j ^(tij C Xij^ A Xij C pijpij^ 

Here, we use new variables Xij as placeholders for pij. The variables x G Var 
now range over sets of trees. The formula above is a co-definite set constraint 
with additional constants noted p{pij)- The constant p{pij) is interpreted as the 
set p{pij). 

Let p' = Tp{p) and p" = Tp{p). Then p'{p) C p"{p) holds for all p G Pred. 
This can be seen as follows. For every tree valuation a satisfying the condition 
in the set comprehension for p', the set valuation Ua defined by (Ja(x) = {a(a;)} 
satisfies the condition in the set comprehension for p". Clearly, aa{tij) C p{pij)] 
we replace the inclusion Uj C p{pij) by the equivalent conjunction = Xij A 
Xij C p{pij). If a a satisfies the equality Uj = Xij then also ^{tij C Xij) by 
Lemma 1. 

Hence, Tp is indeed an abstraction of Tp, and, thus, gfp{Tp) C gfp(Tp). 
This concludes the first part of the proof. 

2- gfp(Tp) C gSol(ipp). In order to show that gfp(Tp) C gSol{ipp), we 
first reformulate the definition of Tp as follows. 

Tt{p){p) = U f\<^{Uj^Xij) ^ XijC p{pij)} 

cr:Var^2^v * 

Fix p and let p" = Tp{p). 

We next exploit the fact that the solutions of co-definite set constraints are 
closed under arbitrary unions (Proposition 1). Hence, we can replace the union 
of solutions in the formula above by the greatest solution. We obtain that 

p"(p) = where = gSol{f\^{tij C Xij) A Xij C 

i 3 

Since all program variables are renamed apart, we have p"{p) = where 

(T = gSol{l\l\<P{Uj C Xij) A Xij C p{pij)). 

i 3 

Thus, we have p"{p) = (j{p) where 

a = gSol{p = j c Xij) A Xij c p{pij)). 

i i j 

Again, since all program variables are renamed apart, 

p" = gSol{ /\ p = IJti A/\/\^(ti, j C Xij) A Xij C p{pij)). 

pePred i '^3 




tAp)(p) = 
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Here, we equate the interpretation p" : Pred ^ 2^^ with a valuation a inter- 
preting a formula with predicate symbols p € Pred and tree variables x G Var 
both ranging over sets of trees, and with constants of the form p(pij) standing 
for the corresponding sets. We omit any further formalization of this setting. 

Let po be any fixpoint of Tp , i.e., Tp (po) = Po- This means that po is a 
solution (the greatest one, in fact) of 

P — A C Xij^ A Xij C Po(^Pij)~ 

p^Pred i '^3 

That is, Po is a solution of tpp. Hence, po is smaller than the greatest solution 
of (pp. This is true in particular if po is chosen as the greatest fixpoint of Tp. 
This concludes the second part of the proof. □ 

Theorem 3 (Set-based failure analysis for logic programs). 

The query p{x) is finitely failed in every fair execution of the logic program P 
if the value of p in the greatest solution (over sets of infinite trees) of the co- 
definite set constraint tpp derived from P is the empty set; i.e., for all predicates 
p e Pred, if gSol{(pp){p) = 0 then p{x) G FF{P). 

Proof. We combine Theorems 2 and 1. □ 

A more precise formulation of the statement above is: the emptiness of the 
computed value for an argument variable in the z-th clause of p entails the finite 
failure of every predicate call of p with that clause. 

Remark 5. Since the domains of infinite and rational trees are equivalent wrt. 
to finite failure, and failure over infinite trees implies failure over finite trees, we 
have the following two statements. 

Given a logic program P over rational trees [over finite trees], the query p{x) is 
finitely failed if the value of p in the greatest solution over sets o/ infinite trees 
of the co-definite set constraint (pp derived from P is the empty set. 

Remark 6. Essentially, the set constraint derived from a logic program P in the 
‘least-model’ analysis of Mishra [30] is of the form 

Tip Up rii^p 

p^Pred i '^3 

Instead of Lemma 1, we have the obvious fact that the set valuation a a defined 
by (Ta{x) = {a(a;)} satisfies the set constraint x = t (which is equivalent tot = x) 
if the tree valuation a satisfies the tree constraint x = t. Since we also have the 
existence of greatest solutions over the domain of non-empty path-closed sets of 
(finite or infinite) trees (see Remark 4), the proof of Theorem 2 goes through 
also for i[p instead of tpp, and the statements in this and the next section 
hold in the appropriate adaptation. One can prove that gSol{(pp) < gSol{tpp) 
(see [5]), i.e. the analysis using path-closed constraints is less accurate than the 
one with co-definite set constraints. Solving path-closed constraints is still an 
open problem (both, for least and for greatest solutions). 
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5 Concurrent Constraint Programs 

We consider concurrent constraint (cc) programs (see e.g. [36,37]) in a normalized 
form such that we can employ a Prolog-style clausal syntax. This is a notational 
convention which is convenient to establish a connection to logic programming. 

Furthermore, we consider only the case where constraints C are term equa- 
tions ti = t 2 interpreted over infinite trees, as in the cc programming language 
and system Oz [32,37]. Hence, we can adopt a Prolog-like syntax and assume 
that every procedure p is defined either by a single fact or by several guarded 
clauses of the form 



p{x) ^ X = t\pi{ti),. . .,Pn{tn)- 

In such a guarded clause, we call x = t the guard and pi(ti), . . . the body. 

The operational semantics of a cc program P is defined through a fair transi- 
tion system Tp’^ as Tp for logic programs (again with the non-deterministic fair 
selection rule), with one important difference: A selected query p(t) can only be 
applied if amongst the guarded clauses of predicate p there is one, the one 
with body x = ti\ f\j Pijitij), say, such that x = t entails 3-x x = tp, if this is 
the case in a state S, then the successor state will be 5 A /\jPij{tij) under the 
most general unifier of t and U (for a more precise definition, see e.g. [36,37]). 
Notice that a logic program is a special case of a cc program where all guards 
are trivially true, e.g. x = x. 

Failure of cc programs. We next apply the approximation method of the previous 
section to logic programs abstracting cc programs in order to predict the behavior 
of the latter. 

Define the logic program F abstracting the cc program P by replacing the 
guard I with conjunction. It is an abstraction in the following sense. 

Proposition 2. If the query p{t) finitely fails in the logic program P abstracting 
the cc program P then failure is inevitable in fair executions of the cc program P 
unless a process (i.e. a predicate call) suspends forever. 

Proof. Observe that every (finite or infinite) fair computation in P in which 
no process suspends forever induces a fair computation in P. Namely, when- 
ever a selected query p(t) is applied with a guarded clause in P it can also be 
applied with the associated unguarded clause in P. This proves the claim by 
contraposition. □ 



Proposition 3 (Prediction of failure behavior of cc programs). 

Failure is inevitable in fair executions of the cc program P unless a process 
suspends forever, if the value of p in the greatest model of compl{P) over the 
domain of infinite trees is the empty set. 

Proof. We combine Proposition 2 and Theorem 1. □ 
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Theorem 4 (Set-based failure analysis for cc programs). 

Failure is inevitable in fair executions of the cc program P unless a process 
suspends forever if the value of p in the greatest solution (over sets of infinite 
trees) of the co-definite set constraint ipp derived from P is the empty set. 

Proof. We combine Theorem 3 and Proposition 3. □ 



6 Examples 

We will give some examples to illustrate how our method of approximating 
greatest models with co-definite set constraints tests the inevitability of certain 
runtime errors. Consider the following simple stream program. 

stream([X, yjS”]) ^ T = s(s(X)), computation(A'), stream([y|S']). 
main(Z) v- stream([Z|T]). 

Suppose we know that the predicate computation makes sense only for (trees 
representing) odd numbers, whereas no such restriction is known for main and 
stream. This invariant can be expressed by the following set constraint, which 
may have been derived from another code fragment or externally provided by a 
program annotation. 

computation C s(0) U s(s(computation)) . (1) 

Further, we can approximate the set of non-failed computations of the program 
with the constraint 

stream C cons{X , cons{Y, S)) A (2) 

X C computation A XCs-i(s-i(T)) A 
Y C (stream) A S' C con (stream) A 

main C cons^j^^ (stream) . 

It is not difficult to see that the greatest solution of the conjunction of (1) 
and (2) assigns to the variable main (as well as to X, Y, and computation) the 
set of odd numbers. We obtain from this fact that, for example, the query main(O) 
inevitably leads to a state where computation is called with a wrong argument. 

We illustrate now the necessity to consider infinite trees by another example. 
Consider the reactive logic program P defined by 

The execution of the query p{x) does not fail, whether the program is defined 
over the domain of finite or infinite trees. We derive the co-definite set constraint 
Pp = P f{x) A X C p. When interpreted over sets of finite trees, pp has as 
greatest solution the valuation assigning the empty set to p (and x). In the 
infinite tree case the greatest solution assigns to p the singleton set containing 
the infinite tree /(/(/(. . .))). That is, an interpretation of the derived co-definite 
set constraint over sets of finite trees does not admit the prediction of finite 
failure. 
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7 Conclusion 

We have presented a set-based analysis of logic programs with ongoing behavior 
(i.e. with the greatest-fixpoint semantics). We have given a characterization of 
finite failure of logic programs over rational or infinite trees through the greatest 
model over infinite trees, and we have exhibited a connection between the in- 
evitability of ‘inconsistent-store’ runtime error for cc programs and finite failure 
for logic programs, thus indicating a potential application to error diagnosis for 
cc programs. 

Our ‘greatest-model’ set-based analysis of logic programs is interesting in its 
own right, as a particular instance of static analysis, and also in comparison with 
the ‘least-model’ set-based analyses of classical logic programs e.g. by Mishra [30] 
or by Heintze and Jaffar [22]. 

The practicability of our approach depends on the efficiency of the constraint 
solving. Succeeding the technical report [8] on which this paper and [4] are based, 
Devienne, Talbot and Tison [16] have given a strategy for solving co-definite set 
constraints which may achieve an exponential speedup. The realization of this 
set-based analysis for the Oz system, and its extension to reactive Oz programs 
with non-cc features such as cells and higher-order features is part of ongoing 
work. We have implemented a prototype version (with an incomplete constraint 
solver); experiments seem to indicate its potential usefulness for finding bugs. 

One question arising from this work and the work by Cousot and Cousot 
in [12] is whether this set-based analysis is an instance of an abstract interpre- 
tation, i.e., whether our constraint-solving process is isomorphic to the iteration 
of an abstraction of the Tp fixpoint operator. 
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Abstract. MetaML is a multi-stage functional programming language 
featuring three constructs that can be viewed as statically-typed refine- 
ments of the back-quote, comma, and eval of Scheme. Thus it provides 
special support for writing code generators and serves as a semantically- 
sound basis for systems involving multiple interdependent computational 
stages. In previous work, we reported on an implementation of MetaML, 
and on a reduction semantics and type-system for MetaML. In this pa- 
per, we present An Idealized MetaML (AIM) that is the result of our 
study of a categorical model for MetaML. An important outstanding 
problem is finding a type system that provides the user with a means 
for manipulating both open and closed code. This problem has eluded 
efforts by us and other researchers for over three years. AIM solves the 
issue by providing two type constructors, one classifies closed code and 
the other open code, and exploiting the way they interact. We point out 
that AIM can be verbose, and outline a possible remedy relating to the 
strictness of the closed code type. 



1 Introduction 

“If thought corrupts language, language can also corrupt thought”^ . Staging com- 
putation into multiple steps is a well-known optimization technique used in many 
important algorithms, such as high-level program generation, compiled program 
execution, and partial evaluation. Yet few typed programming languages allow 
us to express staging in a natural and concise manner. MetaML was designed to 
fill this gap. Intuitively, MetaML has a special type for code that combines some 
features of both open code, that is, code that can contain free variables, and 
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closed code, that is, code that contains no free variables. In a statically typed 
setting, open code and closed code have different properties, which we explain 
in the following section. 



Open and Closed Code Typed languages for manipulating code fragments 
either have a type constructor for open code [9,6,3,11], or a type constructor for 
closed code [4,13]. Languages with open code types are useful in the study of 
partial evaluation. Typically, they provide constructs for building and combin- 
ing code fragments with free variables, but do not allow the execution of such 
fragments. Being able to construct open fragments enables the user to force 
computations “under a lambda” . Executing code fragments in such languages 
is hard because code can contain “not-yet-bound identifiers” . In contrast, lan- 
guages with closed code types are useful in the study of run-time (machine) code 
generation. Typically, they provide constructs for building and executing code 
fragments, but do not allow computations “under a lambda” . 

The importance of having both a way to construct and combine open code and 
to execute closed code within the same language can be intuitively explained in 
the context of Scheme. Efficient implementations of Domain-Specific or “little” 
languages can be developed as follows: First, build a translator from the source 
language to Scheme, then use eval to execute the generated Scheme code. Because 
such a translator will be defined by induction over the structure of the source 
term, it will need to return open terms when building the inside of a A-abstraction 
(or any such binding construct), which can (and will often) contain free variables. 
For many languages, such an implementation would be almost as simple as an 
interpreter for the source language (especially if back-quote and comma are 
used), but would have almost no interpretative overhead. 



MetaML MetaML [11,10] provides three constructs for manipulating open code 
and executing it: Brackets (_), Escape and Run run _. An expression (e) defers 
the computation of e; ~e splices the deferred expression obtained by evaluating 
e into the body of a surrounding Bracketed expression; and run e evaluates e to 
obtain a deferred expression, and then evaluates it. Note that ~e is only legal 
within lexically enclosing Brackets. Finally, Brackets in types such as <int> are 
read “Code of int”. To illustrate, consider the following interactive session: 

-| val rec exp = fn n => fn x => 

if n=0 then <1> else < ~x * ~(exp (n-1) x) >; 
val exp = fn : int -> <int> -> <int> 

- 1 val exponent = fn n => 

<fn a => ~(exp n <a>)>; 
val exponent = fn : int -> <int -> int> 

- 1 val cube = exponent 3 ; 

val cube = <fn a=>a* (a* (a*l))> : <int -> int> 
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- 1 val program = <~cube 2> 

val program = <(fn a => a * (a * (a * 1))) 2> : <int> 

-| run program; 
val it = 8 : int 

Given an integer exponent n and a code fragment representing a base x, the func- 
tion exp returns a code fragment representing a power. The function exponent 
is similar, but takes only an integer and returns a code fragment representing a 
function that takes a base and returns the power. The code fragment cube is the 
specialization of exponent to the power 3. Next, we construct the code fragment 
program which is an application of the code of cube to the base 2. Finally, the 
last declaration executes this code fragment. 



Problem Unfortunately, the last declaration is not typable with the basic type 
system of MetaML [10]. The essence of the problem seems to be that MetaML 
has only one type constructor for code. Intuitively, to determine which code 
fragments can be executed safely, the MetaML type system must keep track of 
variables free in a code fragment. But there is no way for the type system to 
know that program is closed from its type, hence, a conservative approximation 
is made, and the term is rejected by the type system. 



Contribution and Organization of this Paper In previous work [II], we 
reported on the implementation and applications of MetaML, and later [10] 
studied a reduction semantics and a type system for MetaML. However, there 
were still a number of drawbacks: 

1. As discussed above, there is a typing problem with executing a separately- 
declared code fragment. While this problem is addressed in the implementation 
using a special typing rule for top-level declarations [12], this solution is ad 
hoc. 

2. Only a call-by-value semantics could be defined for MetaML, because substi- 
tution was a partial function, only defined when variables are substituted with 
values. 

3. The type judgment is needlessly complicated by the use of two indices. More- 
over, the type system has been criticized for not being based on a standard 
logical system [13]. 

This paper describes the type system and operational semantics of An Idealized 
MetaML (AIM), whose design is inspired by a categorical model for MetaML [1]. 
AIM is strictly more expressive than any known typed multi-level language, and 
features: 
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1. An open code type (t), which corresponds to Qt of A® [3] and (t) of MetaML; 

2. A closed code type [t], which corresponds to \3t of [4]; 

3. Cross-stage persistence of MetaML; 

4. A Run-With construct, generalizing Run of MetaML. 

In a capsule, the model-theoretic approach has guided the design of AIM in two 
important ways: First, to achieve a semantically sound integration of Davies and 
Pfenning’s A'^ [4] and Davies’ A^ [3], we must use two separate type constructs, 
and not one, as was the case with MetaML. Second, we identified a canonical 
isomorphism between the (effect-free interpretation of the) two types [t] and 
[(f)]. This isomorphism formalized the interaction between open and closed code 
types, and lead us to both a generalization of Run, and to identifying a new 
and important (effectful) combinator that we have called compile: [{t)] — > [t]. In 
addition, the model-theoretic approach has suggested a number of simplifications 
over MetaML [10], which overcome the problems mentioned above: 

1. The type system uses only one level annotation, like the A^type system [3]; 

2. The level Promotion and level Demotion lemmas (cf. [10]), and the Substitu- 
tion lemma, are proven in full generality and not just for the cases restricted 
to values. This development is crucial for a call-by-name semantics. Such a se- 
mantics seems to play an important role in the formal theory of Normalization 
by Evaluation and Type Directed Partial Evaluation [2]; 

3. The big-step semantics is defined in the style in which A^was defined [3], and 
does not make explicit use of a stateful renaming function; 

4. Terms have no explicit level annotations. 

Furthermore, it is straightforward to extend AIM with new base types and con- 
stants, therefore it provides a general setting for investigating staging combina- 
tors. 

In the rest of the paper, we present the type system and establish several of its 
syntactic properties. We give a big-step semantics of AIM, including a call-by- 
name variant, and prove type-safety. We present embeddings of A^, MetaML 
and A'^ into AIM. Finally, we discuss related works. 

2 AIM: An Idealized MetaML 

The definition of AIM’s types t G T and terms e G if is parameterized with 
respect to a signature consisting of a set of base types b and constants c: 

t e T:: = b \ ti ^ t2 \ {t) \ [t] 

e £ E: : = c \ X \ Cl C 2 \ Xx.e \ (e) | ~e | run e with {xi = Ci\i G m} | 
box e with {xi = Ci\i G m} | unbox e 

where m is a natural number, and is identified with the set of its predecessors. 
The first four constructs are the standard ones in a call-by- value A-calculus with 
constants. Bracket and Escape are the same as in MetaML [11,10]. Run-With 
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r, x: t'l h e: 

r h c: C r h x: r if r X = t"* and m < n ^ ^ 

rh Ax.e:(fi 

Eh ei:(ti El-e2:tr Eheir+i E h e: (f)" 

Eh ei 62:12 Eh (e):(f)" Eh ~e:r+^ 

Ehe^:[E]" E+Mx^:[E]"|i£m}he:(f)" 

E h run e with Xi = 

E h 6i: [fi]" {xi: € m} h e:f° E h e: [f]" 

E h box e with Xi = d: [t]" E h unbox e:t" 

Fig. 1. Typing Rules 

generalizes Run of MetaML, in that it allows the use of additional variables Xi in 
the body of e if they satisfy certain typing requirements that are made explicit 
in the next section. Box-With and Unbox are not in MetaML, but are motivated 
by A'^of Davies and Pfenning [4]. We use some abbreviated forms: 

run e for run e with 0 
box e for box e with 0 

run e with Xi = Cj for run e with {xi = 6i\i G m} 
box e with Xi = ei for box e with {xi = 6i\i G m} 



2.1 Type System 

An AIM typing judgment has the form E h e: t", where t G T, n G N and E is 
a type assignment, that is, a finite set {xii tf'\i G m} with the Xi distinct. The 
reading of E h e: t” is “term e has type t at level n in the type assignment E ”. 
The level of a subterm is the number of surrounding Brackets, minus the number 
of surrounding Escapes. If not otherwise indicated, the level of a term is zero. 
We say that E a; = if x: is in E. Furthermore, we write E+’’ for the type 

assignment obtained by incrementing the level annotations in E by r, that is, 
E+’’ a; = if and only if E a; = f”. Figure 1 gives the typing rules for AIM. 
The Constant rule says that a constant c of type E, which has to be given in the 
signature, can be used at any level n. The Variable rule incorporates cross-stage 
persistence, therefore if x is introduced at level m it can be used later, that 
is, at level n > m, but not before. The Abstraction and Application rules are 
standard. The Bracket and Escape rules establish an isomorphism between 
and (t)”. Typing Run in MetaML [10] introduces an extra index-annotation on 
types for counting the number of Runs surrounding an expression (see Figure 3). 
We avoid this extra annotation by incrementing the level of all variables in E. 
In particular, the Run rule of MetaML becomes 

E+i h e: (t)" 

E h run e: t” 
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The Box rule ensures that there are no “late” free variables in the term being 
Boxed. This ensures that when a Boxed term is evaluated, the resulting value is 
a closed term. The Box rule ensures that only With-bound variables can occur 
free in the term e. At the same time, it ensures that no “late” free variable can 
infiltrate the body of a Box through a With-bound variable. This is accomplished 
by forcing the With-bound variables themselves to have a Boxed type. Note that 
in run e with Xi = ei the term e may contain other free variables besides the xi. 

2.2 Properties of the Type System 

The following level Promotion, level Demotion and Substitution lemmas are 
needed for proving Type Preservation. 

Lemma 1 (Promotion). If A, ^2 b e: then A, ^ • 

Meaning that if we increment the level of a well-formed term e it remains well- 
formed. Furthermore, we can simultaneously increment the level of an arbitrary 
subset of the variables in the environment. In this paper, proofs are omitted for 
brevity (Please see technical report for proof details [8]). 

Demotion on e at n, written e|„, lowers the level of e from level n -I- 1 down to 
level n, and is well-defined on all terms, unlike demotion for MetaML [ 10 ]. 

Definition 1 (Demotion). e|„ is defined by induction on e: 

c[n=C 

x[n=X 

(ci 02') \.n — \-n ^2 tn 

(Ax.e) i„=Aa;.ei„ 

(c) in = (c in-l-l) 

~e|o=run e 

( c)in-|-l= (e|n) 

(run e with Xi = ei)|„=run ej,„ with Xi = ei|„ 

(box e with Xi = Ci) |„=box e with Xi = Ci 
(unbox e) J.„=unbox e 

The key for making demotion total on all terms is handling the case for Escape 
~e|o: Escape is simply replaced by Run. It should also be noted that demotion 
does not go into the body of Box. 

Lemma 2 (Demotion). If h e: then F h e|„: t”. 

Meaning that if we demote a well-formed term e it remains well-formed, provided 
the level of all free variables is decremented. 

Lemma 3 (Weakening). If A, A b ^ 2 - ftf and x is fresh, then A, x: , A b 

62:^2- 
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Lemma 4 (Substitution). // A h ei: anrf Li, a;: t” , /2 h 62: ^2 A, L2 h 

62(3;: = ei]: t2 • 

This is the expected substitution property, that is, a variable x can be replaced 
by a term ei, provided ci meets the type requirements on x. 

3 Big-Step Semantics 

The big-step semantics for MetaML [11] reflects the existing implementation: it 
is complex, and hence not very suitable for formal reasoning. Figure 2 presents 
a concise big-step semantics for AIM, which is presented at the same level of 
abstraction as that for [3] . We avoid the explicit use of a gensym or newname 
for renaming bound variables, which here is implicitly done by substitution. 

Definition 2 (Values). 

yO g yO .._ g I yg;.e | (v^) \ box e 

tji g yi : : = c I a; I I Aa;.u^ | (u^) | run with Xi = vj \ 

box e with Xi = vj \ unbox 

yTi+2 g yn+2 . . _ c I a; I j Aa;.u”“'"^ j j j 

run with Xi = | box e with Xi = | unbox 

Values have three important properties: First, a value at level 0 can be a Brack- 
eted or a Boxed expression, reflecting the fact that terms representing open and 
closed code are both considered acceptable results from a computation. Second, 
values at level n+1 can contain Applications such as {{Xy.y) {Xx.x)), reflecting 
the fact that computations at these levels can be deferred. Finally, there are no 
level 1 Escapes in values, reflecting the fact that having such an Escape in a 
term would mean that evaluating the term has not yet been completed. This is 
true, for example, in terms like (~(/ x)). 

Lemma 5 (Orthogonality). If v € and F \- v: [t]*^ then %\~ v. [t]°. 



Theorem 1 (Type Preservation). If h e: and e ^ v then v g V” 

and F'^^ F u: t”. 

Note that in AIM (unlike ordinary programming languages) we cannot restrict 
the evaluation rules to closed terms, because at levels above 0 evaluation is 
symbolic and can go inside the body of binding constructs. On the other hand, 
evaluation of a variable at level 0 is an error! The above theorem strikes the right 
balance, namely it allows open terms provided their free variables are at level 
above 0 (this is reflected by the use of in the typing judgment). 

Having no level 1 Escapes ensures that demotion is the identity on as 

shown in the following lemma. Thus, we don’t need to perform demotion in the 
evaluation rule for Run when evaluating a well-formed term. 
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Evaluation. 



0 0 0 
ei ^ Xx.e 62 ^ vi e|a;:= uij ^ V2 

0 

6l 62 ^ V 2 

6i ^ Vi elxn = Vij ^ {V ) V io ^ V 

n 0 

run 6 with Xi = d ^ v 



Xx.e Xx.e 



6 ^ {v) 



ei ^ Vi 



box 6 with Xi = ei ^ box e[xi:= Vi] 

Building. 



'^1 / / 
e ^ box e e ^ v 

unbox e V 



n+1 

e ^ V 



unbox 6 unbox n 



n+1 

e ^ V 



\ \ 
AX.e ^ AX.V 



n+1 

X X 



n+1 

e ^ V 



n+1 

Ci ^ Vi 



run 6 with Xi = ei run v with Xi = Vi 

n+1 

6i ^ Vi 

box 6 with Xi = ei i box e with Xi = Vi 
Stuck. 



n+1 

e ^ V 

n+2 

e ^ V 

n+1 n+1 

6 i ^ Vi 62 ^ V 2 

n+1 

ei 62 ^ t^l V2 



n+1 

e ^ V 



(e) ^ {v) 



n+1 

6 ^ C 



e ^ V ^ box e' 
unbox 6 err 



6i ^ u ^ Xx.e 
0 

6162 ^ err 



ei ^ Vi e[xi: = Vi] ^ v ^ {e') 

n 0 

run 6 with Xi = ei ^ err 



e ^ V ^ {e') 



Fig. 2. Big-Step Semantics 



Lemma 6 (Value Demotion). If v G then vin= v. 

A good property for multi-level languages is the existence of a bijection between 
programs 0 h e: and program representations 0 h (v): (t)^ . This property holds 

for AIM. In fact it is a consequence of the following result: 

Proposition 1 (Reflection). If F \- e: t”, then F~^^ h e: and e G 

Conversely, if v G and h v: then FG v.C . 
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3.1 Call-by-Name 

The difference between the call-by-name semantics and the call-by- value seman- 
tics for AIM is only in the evaluation rule for Application at level 0. For call- by- 
name, this rule becomes 

Cl Xx.e e[x: = 62 ] w 
() 

d €2 ^ V 

The Type Preservation proof need only be changed for the Application case. 
This is not problematic, since the Substitution Lemma for AIM has no value 
restriction. 

Theorem 2 (CBN Type Preservation). If h e: and e ^ v then 

u G P" and r+^ h 

3.2 Expressiveness 

MetaML’s type system has one Code type constructor, which tries to combine 
the features of the Box and Circle type constructors of Davies and Pfenning. 
This combination leads to the typing problem discussed in the introduction. In 
contrast, AIM’s type system incorporates both Box and Circle type construc- 
tors, thereby providing correct semantics for the following natural and desirable 
functions: 

1. unbox : [t] ^ t. This function executes closed code. AIM has no function of 
the opposite type t ^ [t], thus we avoid the “collapse” of types in the recent 
work of Wickline, Lee, and Pfenning [13]. Such a function does not exist in 
MetaML. 

2. up : t — > {t). This function corresponds to cross-stage persistence [11], in fact it 
embeds any value into an open fragment, including values of functional type. 
Such a function does not exist in A^. At the same time, AIM has no function 
of the opposite type (t) t, reffecting the fact that open code cannot be 
executed, up is expressible as Xx.{x). 

3. weaken: [t] — > (t). This is (almost) the composite of the two functions above, 

weaken reflects the fact that closed code can always be viewed as open code. 
AIM has no function of the opposite type (t) [t] . 

4. compile: [(f)] ^ [f]. This function allows us to convert a Boxed Bracket value 
into a Boxed value. It can be viewed as the essence of the interaction be- 
tween the Bracket and the Box type. Compile is not expressible (with the 
desired strictness behavior) in the language, but has the following operational 
semantics: 

e box e' e' (u') 
compile e box (u'io) 

Type Preservation is still valid with such an extension. 
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5. execute: [{t)] — > t. This function executes closed code. It can be defined in terms 
of Run-With as Aa;.run unbox x with x = x, and also in terms of Compile as 
Ax. unbox (compile x). 

Now, the MetaML example presented in the Introduction can be expressed in 
AIM as follows: 

- 1 val rec exp = box (f n n => f n x => 

if n=0 then <1> else < ~x * ~( (unbox exp) (n-1) x) >) 
with {exp=exp}; 

val exp = [fn] : [int -> <int> -> <int>] 

- 1 val exponent = box (f n n => 

<fn a => ~( (unbox exp) n <a>)>) 
with {exp=exp}; 

val exponent = [fn] : [int -> <int -> int>] 

-| val cube = compile (box ((unbox exponent) 3) 

with {exponent=exponent}) ; 

val cube = [fn a => a * (a * (a * 1))] : [int -> int] 

-| val program = compile (box < (unbox cube) 2> 

with {cube=cube}) 

val program = [(fn a => a * (a * (a * 1))) 2] : [int] 

-| unbox program; 
val it = 8 : int 

In AIM, asserting that a code fragment is closed (using Box) has become part 
of the responsibilities of the programmer. Furthermore, Compile is needed to 
explicitly overcome the default lazy behavior of Box. If Compile was not used 
in the above examples, the (Boxed code) values returned for cube and program 
would contain unevaluated expressions. 

Unfortunately, the syntax is verbose compared to that of MetaML. In future 
work, we hope to improve the syntax based on experience using AIM. In partic- 
ular, we plan to investigate an eager operational semantics for Box, which should 
simplify the formalization of MetaML constructs in AIM, and perhaps make the 
Compile combinator unnecessary. 



4 Embedding Results 

This section shows that other languages for staging computations can be trans- 
lated into AIM, and that the embedding respects the typing and evaluation. The 
languages we consider are A® [3], MetaML [10], and A'^ [4]. 
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4.1 Embedding of 

The embedding of into AIM is straightforward. In essence, corresponds 
to the Open fragment of AIM: 

t e Topen- ■ = b \ti ^ t 2 \ {t) 
e e Eopen- : = c I x I Cl 62 I Aa;.6 | (e) | ~e 

The translation (_0) between A^and AIM is as follows: (O^)*^ = 

(next e)0 = (eO), and (prev e)0 = ~ (eO). With these identifications the 
typing and evaluation rules for A*^ are those of AIM restricted to the relevant 
fragment. The only exception is the typing rule for variables, which in A*^ is 
simply r \- x:f^ if T a; = (this reflects the fact that A^ has no cross-stage 
persistence) . 

We write F I-q e: t and e v for the typing and evaluation judgments of A^, 
so that they are not confused with the corresponding judgments of AIM. 

Proposition 2 (Temporal Type Embedding). If F I-q e: is derivable in 

AO, then F^> h e^: (fO)” is derivable in AIM. 

Proposition 3 (Temporal Semantics Embedding). If e v is derivable 
in AO, then eO tiO is derivable in AIM. 

4.2 Embedding of MetaML 

The difference between MetaML and AIM is in the type system. We show that 
while AIM’s typing judgments are simpler, what is typable in MetaML remains 
typable in AIM. 

t G TMetaML- : = b \ ti ^ t2 \ {t) 

e G EMetaML- : = c | o: | 6i 62 | Xx.e \ (e) | ~e | run e 

MetaML’s typing judgment has the form A \~m e: (t, r)”, where t G T, n,r G N 
and A is a type assignment, that is, a finite set {xt: (ti,rj)”*|i G m} with the 
Xi distinct. We use the subscript M to distinguish MetaML’s judgments from 
AIM’s judgments. Figure 3 recalls the type system of MetaML [10]. 

Definition 3 (Acceptable Judgment). We say that a MetaML typing judg- 
ment {xp. {ti, rj)”*|z G m} \~m e: (t, r)” is acceptable if and only if\/i G m. Vi < 
r. 

Remark F A careful analysis of MetaML’s typing rules shows that typing judg- 
ments occurring in the derivation of a judgment 0 \~m e: (t, r)” are acceptable. In 
fact in a MetaML typing rule the premises are acceptable whenever its conclu- 
sion is acceptable, simply because the index r never decreases when we go from 
the conclusion of a type rule to its premises, Thus, we never get an environment 
binding with an r higher than that of the judgment. 
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r \~M c: {tc, r)" r \~M X'. {t, r)" ii F x = and m + r < n + p 

-T, (ti, r)" Em e: (t2, r)" F \~m ei: {F —> t2,r)'^ -T Em 62: (ti, r)" 

F Em Xx.e: (ti r)” F Em ei 62: (^2, »')" 

TEm e:(t,r)"+^ T Em e:((t),r)" E Em e:((t),r + l)" 

EEm (e):((t),r)" E Em ~e: (t, r)"+i E Em run e: (t, r)" 

Fig. 3. MetaML Typing rules 



Proposition 4 (MetaML Type Embedding). If {xp. G m} \~m 

e: {t, r)” is acceptable, then it is derivable in MetaML if and only if{xp | j g 

m} E e: t” is derivable in AIM. 

4.3 Embedding of 

Figure 4 summarizes the language A'^ [4]. We translate into the Closed 
fragment of AIM: 

t G Tciosed'- '.= b \ti ^ t2 I [t] 

e G Eciosed - : = c I a; I Cl 62 | Xx.e \ box e with Xi = Ci \ unbox e 

We need only consider typing judgments of the form {xp t^\i G m} E e: and 

evaluation judgments of the form e ^ v. These restrictions are possible for two 
reasons. If the conclusion of a typing rule is of the form {xp. t^\i G m} E e: t^ with 
types and terms in the Closed fragment, then also the premises of the typing 
rule enjoy such properties. When e is a closed term in the Closed fragment, the 

Tl 0 

only judgments e' ^ v' that can occur in the derivation of e u are such that 
n = 0 and e' and v' are closed terms in the Closed fragment. 

Definition 4 (Modal Type Translation). The translation of A'^ types is 
given by 

b^ = b {tl ^2)° =t^^t^ (nt)° = 

The translation of A'^ terms depends on a set X of variables, namely those 
declared in the modal context A. 

n A' 

X = unbox a; if x G X 

= y if y^ X 

(box e)^^ = box with {a; = a;|a; G FV(e) C X} 

(let box a; = 6i in e)°^ = (Aa;.e°^^^®^) 

(Ay.e)'^^ = Ay.e'^^ where y ^ X 
(ei 62)°^ = 6°^ 6°^ 
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Syntax 



Types t£Tu 
Expressions e £ 
Type assignments E, A 



= b \ ti ^ t 2 \ nt 

= X I Xx.e I ei 62 I box e | let box x = ei in 62 
= {xi:ti\i £ m} 



Type System 

A', r \-[j x:t if Ax = t 

A-, {r, x: t') !“□ e: t 



A-, r !“□ Xx.e: t' —> t 
T !“□ 6i : t — ^ t Z\; T £□ 62 T 



A-, r \-[j x:t if E® = t 

(A; x: t'), E !“□ 62: t A; E !“□ ei: Dt' 
A', r !“□ let box X = 6i in 62: t 



A; r !“□ 6i 62: t 

Big-Step Semantics 

6i Xx.e 62 v' e[x: = v'] v 
61, 62 V 

61 box 6 62 [x: = e] v 



Z\; 0 h □ e:t 
A: r hn box e: dt 



let box X = 61 in 62 v 



Xx.e Xx.e 



box 6 box e 



Fig. 4. Description of 



Proposition 5 (Modal Type Embedding). If A] F I-q e:t is derivable in 
A'^, then [Z\'^],E'^ h is derivable in AIM’s Closed fragment, where X 

is the set of variables declared in A, {xi\ti\i £ m}^ is {xi'.t^\i £ m}, and 
\{xi:ti\i & m}] is {x£ [tj |z £ m}. 

The translation of A*^ into the AIM’s Closed fragment does not preserve eval- 
uation on the nose (that is, up to syntactic equality). Therefore, we need to 
consider an administrative reduction. 

Definition 5 (Box- Reduction). The ^box reduction is given by the rewrite 
rules 



unbox (box e) — > e 

box e' with Xi = €i, X = box e, xj = ej —> box e^[x: = box e] with Xi = Ci, xj = ej 
where e is a closed term of the Closed fragment. 



Lemma 7 (Properties of Box- Reduction). The -^box reduction on the Closed 
fragment satisfies the following properties: 

— Subject Reduction, that is, T \- e:t and e ^box e' imply T \- e':t 
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— Confluence and Strong Normalization 

— Compatibility with Evaluation on closed terms, that is, e\ v\ and e\ -^box 
62 imply that exists V2 s.t. vi -^tox and 62 ^ V2- 



Lemma 8 (Substitutivity). Civen a closed term eg € Efj the following prop- 
erties hold: 

— e^^[y: = Cq ®] = {e[y: = cq])^^ , provided y X 

- 6°^^^^} [x: = box 6°®] ^box (e[x: = eo])°^ 



Proposition 6 (Modal Semantics Embedding). If e G E\j is closed and 
e V is derivable in then there exists v' such that v' and v' -I^box 

5 Related Work 

Multi-stage programming techniques have been studied and used in a wide vari- 
ety of settings [11]. Nielson and Nielson present a seminal detailed study into a 
two-level functional programming language [9]. Davies and Pfenning show that 
a generalization of this language to a multi-level language called gives rise to 
a type system related to a modal logic, and that this type system is equivalent 
to the binding-time analysis of Nielson and Nielson [4] . 

Gomard and Jones [6] use a statically-typed two-level language for partial evalu- 
ation of the untyped A-calculus. This language is the basis for many binding-time 
analyses. 

Gliick and Jprgensen study partial evaluation in the generalized context where 
inputs can arrive at an arbitrary number of times rather than just two [5], and 
demonstrate that binding-time analysis in a multi-level setting can be done with 
efficiency comparable to that of two-level binding time analysis. 

Davies extends the Gurry-Howard isomorphism to a relation between temporal 
logic and the type system for a multi-level language [3] . 

Moggi [7] advocates a categorical approach to two-level languages based on in- 
dexed categories, and stresses formal analogies with a categorical account of 
phase distinction and module languages. 



Acknowledgments: We would like to thank Bruno Barbier, Jeff Lewis, Emir 
Pasalic, Yannis Smaragdakis, Eelco Visser, Phil Wadler and Lisa Walton for 
comments on a draft of the paper. 
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Abstract. We describe a system which decompiles (reverse engineers) 
C programs from target machine code by type-inference techniques. This 
extends recent trends in the converse process of compiling high-level lan- 
guages whereby type information is preserved during compilation. The 
algorithms remain independent of the particular architecture by virtue 
of treating target instructions as register-transfer specihcations. Target 
code expressed in such RTL form is then transformed into SSA form 
(undoing register colouring etc.); this then generates a set of type con- 
straints. Iteration and recursion over data-structures causes synthesis of 
appropriate recursive C structs; this is triggered by and resolves occurs- 
check constraint violation. Other constraint violations are resolved by C’s 
casts and unions. In the limit we use heuristics to select between equally 
suitable C code — a good GUI would clearly facilitate its professional use. 



1 Introduction 

Over the last forty years there has been much work on the compilation of 
higher-level languages into lower-level languages. Traditionally such lower-level 
languages were machine code for various processors, but there has been growing 
widening of the concept of compilation on one hand to permit the lower-level lan- 
guage to be a language like C (often viewed as a ‘universal assembler language’) 
and on the other to accompany the translation of terms by a corresponding trans- 
lation of types — good exemplars are many internal phases of the Glasgow Haskell 
Compiler [4] which is taken to its logical conclusion in Morrisett et al.’s [8] intro- 
duction of ‘Typed Assembly Language’. A related strand is Necula and Lee’s [9] 
compiler for proof-carrying code in which user types (including a richer set of 
types containing value- or range-specification) and compiler-generated types or 
invariants accompany target code (‘proof-carrying code’) to enable code to be 
safely used within a security domain. 

Two points which can be emphasised are: 

— preserving type information increases the reliability of a compiler by allowing 
it (or subsequent passes) often to report on internal inconsistency if an invalid 
transformation occurs instead of merely generating buggy code; and 

* A preliminary form of this work was presented at the APPSEM’98 workshop in Pisa. 
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— compilers are in general many-to-one mappings in which the target code 
is selected from various equivalent target-code sequences by some notion 
of efficiency — the more optimising a compiler, in general, the greater the 
number of source-code phrases that map to a given (generable) target-code 
sequence. 

We consider the use of types and type-inference for the reverse process of 
decompilation, often called reverse engineering. For the purposes of this paper 
we take the higher-level code to be C and the lower-level code to be register 
transfer language (RTL).^ RTL can be used to express various machine codes in 
architecture independent manner, but in examples we often use a generic RISC- 
like instruction set. Another important application which drove this work was 
a large quantity of BCPL [10] legacy code. BCPL was an untyped fore-runner 
of C popular at Cambridge for low-level implementation until its replacement 
with ANSI C around 10 years ago. Being untyped it has a single notion of vector 
which conflates the notions of array and record types in the same way that 
assembler code does. BCPL is easily translatable to RTL code (and indeed source 
names can be preserved within RTL as annotations) but the challenge was to 
invent appropriate structure or array types for data-structures just represented 
by pointers to vectors. 

One might wonder where the RTL code comes from. It can be obtained by 
simple disassembly (and macro-expansion of instructions to RTL form) of code 
from assembler files, from object files, directly from compiler output or even from 
DLL’s. Note that currently we assume that code is reasonably identified from 
data and in particular the current system presumes that procedure boundaries 
(and even — but less critically — procedure names) are available. 

Now we turn to one of the central issues of decompilation — that compilation 
is a many-to-one map means that we must choose between various plausible al- 
ternative high-level representations of given RTL code. This is instantly obvious 
for the names of local variables which are in general lost in compilation and 
need to be regenerated; although in general we can only give these rather boring 
names, we can also recover information from a relocation or symbol table (e.g. in 
a ELF executable) or from a GUI-driven database to aid serious redevelopment 
of legacy code. However, there are more serious issues. Identifying loops in a 
reducible flowgraph is fairly easy but since a good compiler will often translate 
a “while (e) C”’ loop to a loop of the form 

if (e) { do C while (e) ; } 

we must be prepared to select between or offer the user a choice between various 
alternatives much like names above. 

Note that we do not expect to have types [8] or assertions [9] in the machine 
code (but if we do these may significantly aid decompilation — it seems unfor- 
tunate if aids to program reliability and security make the code-breakers task 
easier too!). See section 7. 

^ Note the notion of source- and target-language is slightly tangled for a decompiler 
and so we will stick to C and RTL for concreteness. 
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We briefly justify decompilation. Apart from the obvious nefarious uses, there 
are real desires (e.g. in the telecoms industry) to continue to exploit legacy 
code with guaranteed equivalence to its previous behaviour. Additionally, we 
expect this project to cast further light on the uses of types of various forms in 
compilation, including proof-carrying code. 

Apart from the glib statement that we decompile RTL to C, certain things 
do need to be made more precise. We will assume that the RTL has 8-, 16-, 32- 
and (possibly) 64-bit memory accesses and additionally a push-down stack for 
allocating temporaries and locals via push and pop. Moreover, the generated C 
will assume that char, short, int and long will represent these types, unsigned 
can be used as a qualifier as demanded by the code (e.g. triggered by unsigned 
division, shift or comparison, or by user GUI interaction) but otherwise signed 
forms of these types are generated. Pointers are currently assumed to be 32-bit 
values and again can only be distinguished from int values by their uses. We 
will see type-inference as the central driver of this process. 

This work represents a position intermediate between traditional reverse engi- 
neering viewpoints. On one hand, there is decompilation work which has mainly 
considered control restructuring and tended to leave variables as int, to be type- 
cast on use as necessary (Cifuentes [1] is a good example). On the other hand, 
the formal methods community has tended to see reverse engineering as recon- 
structing invariants and specifications (e.g. by Hoare- or Dijkstra-style weakest 
precondition or strongest postcondition techniques — see for example Gannod 
and Gheng [5]) from legacy code so that it may be further manipulated. It is 
claimed that a type-based approach can be used for gross-level structuring auto- 
matically (possibly with a GUI driver for major choice resolution) whereas exact 
formal methods techniques are more limited in the size of acceptable problem 
(e.g. due to the need to prove theorems). 

2 Intuitive example 

Gonsider the following straight-line code 

f : Id.w 4 [rO] ,r0 

mul r0,r0,r0 
xor r0,rl,r0 
ret 

and a procedure calling standard which uses ri as argument and result registers. 
It is apparent that f has (at least) two arguments — see later — but for now we 
assume exactly two arguments. It is clear that f could be expressed as 

int f(int rO, int rl) 

{ rO = *(int *) (rO+4) ; 
rO = rO * rO; 
rO = rl ~ rO; 
return rO ; 

} 
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However, if we break register uses into live ranges and give each a separate name 
we get: 

int f(int rO, int rl) 

{ int rOa = *(int *) (rO+4) ; 
int rOb = rOa * rOa; 
int rOc = rl ~ rOb; 
return rOc; 

> 

Now it is apparent here that argument rO could be written as type (int *) 
instead of (int) which allows *(int *) (rO+4) to be replaced by *(rO+l) or 
its syntactically equivalent form rO [1] Moreover (modulo taking care not to 
violate any of C’s rules concerning side-effects and sequence points), variables 
only used once can be folded into their referencing expressions yielding 

int f(int *r0, int rl) 

{ int rOa = rO [1] ; 

return rl ~ (rOa * rOa) ; 

> 

There is now a further issue of stylistic choice as to whether the above code is 
preferred or the alternative: 

int f(int *r0, int rl) ; 

{ return rl ~ (rO[l] * rO[l]); 

} 

which simply may have generated the original code as a result of a compiler’s 
common sub-expression phase. 

We recall the discussion in the introduction in which we observed that the 
more optimising a compiler the more pieces of code are mapped into a given, 
possibly optimal, form. A good correctness-preserving heuristic will select one 
(hopefully readable) form (a maximum- valued solution to various rules) . A GUI 
user interface could select between wide-scale revision (i.e. seeking alternative 
local — to the constraint solver — maximum) or by demanding a choice between 
syntactic forms on a local — to the generated source code — basis. 

3 SSA — Single Static Assignment 

The Single Static Assignment (SSA) form (see e.g. [2]) is a compilation technique 
to enable repeated assignments to the same variable (in flowgraph-style code) to 
be replaced by code in which each variable occurs (statically) as a destination 
exactly once. We use the same technique for decompilation because we wish 

^ Note the possibility that rO could be given a type (struct { int mO, m4, m8; }) 
which would then lead to int rOa = r0->m4;. There is a notion of polymorphism 
here and we return to this point later. 
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to undo register-colouring optimisations whereby objects of various types, but 
having disjoint lifetimes, are mapped onto a single register. 

In straight-line code the transformation to SSA is straightforward, each vari- 
able V is replaced by a numbered instance Vi of v. When an update to v occurs 
this index is incremented. This results in code like 

V = 3; V = v+1; v = v+w; w = v*2; 

(with next available index 4 for w and 7 for v) being mapped to 

V7 = 3; vg = V7+I; vg = V8+W3; W4 = vg*2; 

On path-merge in the flowgraph we have to ensure instances of such variables 
continue to cause the same data-flow as previously. This is achieved by placing a 
logical (single static) assignment to a new common variable on the path-merge 
node, which captures the effect of two separate assignments on the arcs leading 
to the path-merge node. This is conventionally represented by a so-called (jy 
function at entry to the path-merge node. The intent is that 4 >{x, y) takes value 
X if control arrived from the left arc or y if it arrived from the right arc; the value 
of the (()-function is used to define a new singly-assigned variable. Thus consider 

if (p) { V = v+1; V = v+w; } else v=v-l; 
w = v*2; 

which would map to (only annotating v and starting at 4) 

if (p) { V4 = V3+I; V5 = V4+w; } else V6=V3-1; 

V7 = (()(v5,V6); w = V7*2; 

In examples our variable names will be based on those of machine registers rO, 
rl, etc. — instances of these will be given an alphabetic suffix, thus rOa, r4e, etc. 

4 Type reconstruction 

Our type reconstruction algorithm is based on that of Milner’s algorithm W [7] 
for ML; it shares the use of unification but involves a rather more complicated 
type system and delays unification until all constraints are available. Unification 
failure is used to trigger reconstruction of C types in a way which enables the 
constraint resolution failure to be repaired. 

The C algebra of types does not neatly express the type concepts needed 
during type reconstruction^ so we use an internal type algebra (for types t, 
struct members s and register types r) given by: 

t ::= char \ short \ int \ ptr{t) \ array(t) \ mem{s) \ union{ti, . . .,tk) 
s ::= ni : ti, . . . ,Hk ■ tk 
r ::= int \ ptr{t) 

® Indeed sometimes user-interaction may be desirable to select between C alternatives. 
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where the ni range over natural numbers and fc > 0. a and (3 are respectively 
also used to range over t and r (to highlight ‘new’ type variables generated 
during unification). The notation mem{s) represents a storage type known to 
contain various types at identified offsets (i.e. it represents C’s structs and 
unions and, as we will see, may also represent arrays only accessed by constant 
subscripts). While the above type grammar allows user interaction to select all 
C types, for automatic inference it is convenient to require that all ptr(t) types 
are of the form ptr{mem{s)) e.g. by selecting s = 0 : t. However, we will still 
feel free to write (e.g.) ptr{int) to be understood as shorthand. Finally, for the 
purposes of this paper, union{ti, . . .,tk) is not used as it can be simulated by 
mem(0 : ti, . . . , 0 : tfc). 

Each machine code instruction now generates constraints on the types of its 
operands in a straightforward manner. For example, adopting the notation that 
tk is the type ascribed to register rk, we have 



instruction 




generated constraint 


mov 


r4,r6 


t6 = f4 


Id. w 


n [r3] ,r5 


t3 = ptr{mem{n : t5)) 


xor 


r2a,rlb,rlc 


t2a = int, tlb = int, tic = int 


add 


r2a,rlb,rlc 


t2a = ptr{a),tlb = int, tic = ptr{a)\J 
t2a = int, tlb = ptr{a'),tlc = ptr{a')\J 
t2a = int, tlb = int, tic = int 


Id. w 


(r5) [rO] ,r3 


to = ptr{array(t3)),t5 = int\J 
to = int, t5 = ptr{array{t3)) 


mov 


#42, r7 


t7 = int 


mov 


#0,r7 


t7 — int V t7 = ptr{a") 



Note that overloaded C operators such as + naturally lead to disjunctive type 
constraints — compare add with xor. This also applies to indexed load and store 
instructions^ and the constant zero, which is conventionally used to implement 
the null pointer constant. 

Type unification is deferred until section 5, but for now it suffices to consider 
it as Herbrand unification where occurs-check failures are repaired rather than 
causing premature termination. 

4.1 Inventing recursive data-types from loops or recursion 

Consider the C recursive data type 

struct A { int hd; struct A *tl; }; 

and the iterative and recursive procedures for summing its elements given in 
Figs. 1 and 2. (Note that for convenience the assembler code is given as a 
compiler might produce, with the original C code as comment and with gener- 
ated label names, but note that code and type reconstruction only depends on 
the machine instructions.) Figs. 3 and 4 show the example assembler code in 

^ Here we assume such instructions do no automatic scaling. 
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int f (struct A *x) 

{ int r = 0 ; 

for (; x!=0; x = x->tl) r += x->hd; 
return r; 



f : 



L3F2: 



L4F2: 



mov 


#0,rl 


cmp 


#0,r0 


beq 


L4F2 


Id.w 


0[r0] ,r2 


add 


r2,rl,rl 


Id.w 


o 

u 

1 — 1 
o 
u 


cmp 


#0,r0 


bne 


L3F2 


mov 

ret 


rl,rO 



Fig. 1. Iterative summation of a list 



int g (struct A *x) 

{ return x==0 ? 0 : x->hd + g(x->tl) ; 

} 



g: 



L4F3: 



L8F3: 



push 


r8 


mov 


r0,r8 


cmp 


#0,r8 


bne 


L4F3 


mov 


#0,r0 


br 


L8F3 


Id.w 


00 
1 1 

'U 

o 


jsr 


g 


Id.w 


u 

1 — 1 
00 
u 

o 


add 


rl,r0,r0 


pop 


r8 


ret 





Fig. 2. Recursive summation of a list 
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SSA form and with generated type constraints. We now turn to the process of 
resolving the type constraints for f . 



f : 



tf = to ^ 199 



mov rO,rOa 
mov #0,rla 
cmp #0,r0a 
beq L4F2 



10 = tOa 

tla = int V 11a = ptr(ai) 
10a = int V 10a = ptr{a2) 



L3F2 : mov i^(r0a,r0c) , rOb 106 = 10a, 106 = 10c 
mov </)(rla,rlc) ,rlb 116 = 11a, 116 = 11c 



Id.w 0 [rOb] ,r2a 
add r2a,rlb,rlc 



Id. w 4 [rOb] , rOc 
cmp #0,r0c 
bne L3F2 



106 = ptr{mem {0 : 12a)) 

12a = plr(as), 116 = int, tic = plr(a3)V 
12a = int, 116 = ptr{a4),tlc = plr(a4)V 
12a = int, 116 = int, tic = int 
106 = plr(mem (4 : 10c)) 

10c = int V 10c = ptr{as) 



L4F2 : mov i^(rla,rlc) ,rld tld = tla, tld = 11c 
mov rld,rOd tOd — tld 

ret 199 = tOd 



Fig. 3. Iterative sum in SSA form with generated type constraints 



Type reconstruction (for f using the constraints in Fig. 3) now proceeds by: 

Occurs-check constraint failure: 

10c = 106 = plr(mem(4 : 10c)) = ptr{mem{0 : 12a) 

Breaking cycle with: 

struct G { 12a mO; 10c m4; . . .} i.e. 

10c = ptr{mem{0 : 12a, 4 : 10c)) = plr(struct G) 

A record is kept that this particular mem is represented by struct G 
which can then be used for printing types. Solving gives two solutions: 

10 = 10a = 106 = 10c = plr(struct G) 

199 = 11a = 116 = 11c = tld = 12a = tOd = int 

1/ = plr(struct G) ^ int 

and 

10 = 10a = 106 = 10c = plr(struct G) 

12a = int 

199 = tla = 116 = 11c = tld = tOd = ptr{a 4 ) 

tf = plr(struct G) ^ ptr{a 4 ) 
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g: 




tg — to ^ t99 


mov 


rO , rOa 


to — tOa 


push 


r8 




mov 


rOa, r8a 


tSa = tOa 


cmp 


#0,r8a 


t8a = int V t8a — ptr ( 02 ) 


bne 


L 4 F 3 




mov 


#0,r0a 


tOa = int V tOa — ptr(ai) 


br 


L 8 F 3 




L 4 F 3 : Id.w 


4 [r8a] ,rOb 


t8a = ptr{mem{4: : tOb)) 


jsr 


g 


to = tOb, tOc = t99 


Id.w 


0 [r8a] , rla 


t8a = ptr{mem{0 : tla)) 


add 


rla,r0c,r0d 


tla = ptr{a 2 ,) ,tOc = int, tOd — ptr(a 3 )V 
tla = int, tOc = ptr{ai),tOd = ptr (04) V 
tla = int, tOc = int, tOd = int 


L 8 F 3 : mov 


i^(r0a,r0d) ,rOe tOe = tOa, tOe = tOd 


pop 


r8 




ret 




t99 = tOe 



Fig. 4. Recursive sum in SSA form with generated type constraints 



The second solution is a parasitic solution which is caused by the effective over- 
loading of addition and the constant zero on both pointers and integers as dis- 
cussed earlier. It corresponds to creating the variable r in the original code as 

char *r = 0 ; 

and then adding on the (int) elements x->hd by address arithmetic. We believe 
that this false solution (it corresponds to code which is not strictly ANSI con- 
formant) can be eliminated by enhancing the type system with a weak pointer 
type which is not suitable for arithmetic (cf. void * in C); however this awaits 
experiment. 

Having obtained, then, the solution tf = ptr(struct G) — > int with 
struct G { t2a mO; tOc m4; . . .} 

we can set about mapping the assembly code into appropriate C. Note that no 
information has been derived about the size of struct G; the use of the ellipsis 
above corresponds to the type “record type unknown apart from having field m” 
obtained for a Standard ML function such as 

fun f(x) = x.m; 

We model this in concrete C by creating an optional padding type Tpad.® It is 
now simple to translate the above code into the following C by re-constituting 

® Unfortunately for us, C does not allow zero-sized types and so we must allow the 
field to be optional or allow the C pre-processor to macro-expand away Tpad if later 
information (e.g. from uses of the function f) indicate its size to be zero. 




Type-Based Decompilation 



217 



expressions from variables only used once and by pattern matching (out of the 
scope of this paper) for commands to obtain: 

struct G { int mO; struct G *m4; Tpad m8; }; 
int f (struct G *x) 

{ int r = 0 ; 
if (x != 0) 

do { r += x->m0; x = x->m4; } while (x != 0) 
return r; 

} 

Further pattern matching can reproduce the original for loop. 

Incidentally, note that the recursive list summation function g results in an 
equivalent set of constraints and therefore can be similarly decompiled into: 

int g (struct G *x) 

{ int r; 

if (x==0) 
r = 0; 

else 

{ int t = g(x->m4) ; 
r = t + x->m0; 

} 

return r; 

} 

But why is this not nearly so close to the original (even if it is one of the common 
coding styles for this type of recursion)? Consider the expression 

x->hd + g(x->tl) 

which ANSI C declare to be implementation defined if gO side-effects x->hd and 
otherwise allows the compiler to choose whether to evaluate x->hd or the call 
to g first — sensible compilers would generally evaluate g(x->tl) first since this 
reduces register pressure. However, conversely, we are not in general at liberty to 
fold a sequential call to gO and an addition of x->hd into the original code and 
hence the above decompilation is as good as we can obtain under the assumption 
that procedure calls (e.g. jsr g) can affect memory arbitrarily. However, if we 
could determine that the call jsr g cannot result in side-effects (on x->hd) then 
the following simplifications are triggered: 

— the code for the else-part could be reconstructed to 

r = x->m0 + g(x->m4) ; 

which is only valid C if gO cannot affect x->m0 

— then, given that both consequents assign to r, the whole body simplifies to 
the original 

return x==0 ? 0 : x->hd + g(x->tl) ; 
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This short-coming of the present type-based approach could be remedied by 
extending function types to include details of side effects — using a type and 
effect system (also known as an annotated type system ) — see for example [11] 
Finally, we observe that all the suggested decompilations of f and g above 
yield identical target code when processed by the compiler (ncc) which was used 
to produce the sample code for decompilation. Of course, we might be able to 
use a better compiler the second time round! 



4.2 When structs cannot resolve type conflicts 

Although our decompiler needs internally a richer set of types than ML (e.g. 
we have seen that Id.w 4[r0] ,rl leads us to reason that rO may be a pointer 
to any type with a 32-bit component at offset 4, including both structures and 
arrays) we have exploited constraint gathering and solution by unification much 
as we might find in an ML compiler. In section 5 we will discuss the additional 
ordering on types (and non-Herbrand unification) occasioned by code which can 
reflect either array element or struct member access. 

(Herbrand) unification may fail for two reasons. Firstly, a type variable may 
need to be unified with a term containing it — this is solved as above by syn- 
thesising recursive data types. Secondly, we may have a straightforward clash of 
type-constructors and it is to this case which we now turn. 

Consider the code: 

h: Id.w 4 [rO] ,rl 

xor rl,rO,rO 
ret 

where rO is constrained to be an int because of its appearance as the source of 
an xor instruction and as a pointer to store (containing an int at offset 4) due 
to the Id.w instruction. (Note that all the uses of rO except for the destination 
of the xor instruction form a single live range and so the transformation to SSA 
form used in the introduction does not help here.) So we attempt to unify int 
with ptr{mem{A : int)) and And no solution. Such situations are deferred until 
the global set of constraint failures are available (here there are no more) and 
then the application to typing outlined by Gandhe et al. [3] for finding maximal 
consistent subsets of inconsistent sets is applied. 

Here we And a benefit of using C as the high-level language for decompilation 
in that it can express such code by casts or union types. C’s union type can 
express the solution trivially as 

int h (union {int i; int *p;} x) { return x.p[l] " x.i; } 

but this is not a very common (nor very readable) form of C and indeed is 
not strictly conforming in ANSI C (reading a union at a different type from 
what it was written is forbidden). We would prefer to restrict the synthesis of 
unions to within generated structs which contain also a discriminator. Cast- 
based alternatives seem better in this case and we get three plausible solutions: 
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int hl(int x) { return *(int *) (x+4) " x; } 
int li2(int *x) { return x[l] ~ (int)x; } 
struct h3arg { Tpadl mO; int m; Tpad.2 m8; }; 
int h3 (struct h3arg *x) { return x->m ~ (int)x; } 

Note we have suppressed the variant of hi in which x is cast to a new struct type 
which contains an int at offset 4; clearly a skilled program re-constructor might 
be able to specify the *(int *) (x+4) more precisely, but inventing a separate 
new datatype for each such access would clutter code for no clear benefit. We 
will prefer option h3 by default (with the understanding that the generated 
struct h3arg will be unified with arguments of callers), leaving array creation 
to be triggered by non-constant indexing (or user GUI interaction); the next 
section investigates the choice between arrays and structs in more detail. 

Of course, one justification of using C in this paper is that the above assembler 
code could not plausibly be generated by any Haskell compiler — C is more 
expressive in this sense. 

4.3 Arrays versus structs 

The approach we have taken so far has been to use structs whenever possible. 
While these, together with casts and address arithmetic, would suffice for decom- 
pilation, it is more rational to trigger array synthesis when indexing instructions 
occur, whether they be manifest: 

Id.w (r5) [rO] ,r3 

or more indirectly coded (a non-constant int value being used for addition or 
subtraction at pointer type) such as 

add r5,r0,rl 

Id.w 0 [rl] ,r3 

Such an indexing instruction (for the purposes of this discussion we will 
assume scaling is not done in hardware, thus the effective address is (rO) + (r5)) 
generates constraints (as explained earlier): 

Id.w (r5) [rO] ,r3 tO = ptr{array{P)),tb = int, (3 = /?V 
to = int, t5 = ptr{array{(3)) , t3 = (3 

where (3 is constrained to be a register type, i.e. int or ptr(a). 

If the constraints for a given pointed-to type are all struct types (resulting 
from constant offsets) then the resulting unified type is also struct as in the 
previous subsection. Otherwise, if all accesses via a pointer are of the same size, 
e.g. all 32-bit accesses, then the unified type is array, otherwise a union type is 
generated, e.g. the constraints for 

Id.b 0 [rO] ,rl 

Id.b 48 [rO] ,r2 

Id.w (r5) [rO] ,r3 



unify to yield 
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union G { struct { char mO; char padl [47] ; char m48; } ul; 
int u2 [] ; 

} *r0; 

Inferring limits for arrays requires, in general, techniques beyond those available 
to our type-based reconstruction. If presented with proof-carrying code [9] then 
array bounds could be extracted from code proved to be safe. To a large extent 
however, C programmers do not take great care with array size specifications, 
especially when passed as arguments since the C standard requires formal array 
parameters to be mapped to pointers thereby losing size information. 

Although currently not implemented, note that a GUI could be used to direct 
that the above union should instead be decompiled as 

struct G { char mO; char padl [3] ; int m4[15]; char m48; } *r0; 

when it is clear to a user that the array is actually part of the struct. We return 
to this point in section 6.1. 

5 Type Unification 

Unification of our types is Herbrand-based with the following additional rules, 
i.e. the cases below are tried in order if Herbrand unification fails. 

— type variable a unifies with type t containing a to yield ^[struct G/a] with 
an auxiliary definition of struct G being produced. 

— array{t) and mem{ni : U, . . .rifc : tk) unify to array{t) when type (Vz)ti = t. 

— mem(si) and mem{s 2 ) unify to mem{si U S 2 ); note that keeping conflicting 
items (e.g. mem(0 : int, 1 : char)) is not an error since this may later be 
used to replace the member at offset zero with a union in the generated C. 

6 Selecting C types for generated types 

As noted earlier, generated types are more expressive than C types, and this 
sometimes means that a choice has to be made from among various possi- 
bilities. Moreover certain C types are less commonly used that others, e.g. 
a function parameter is more likely to be described as (int *) rather than 
(int (*) [10]), whereas appropriate uses would leave to identical target code. 
The default method for selecting types which result from unification is as follows: 

— translate char, short, int as char, short, int; 

— translate ptr{t) to T * where T translates t; 

— translate array (t) to T [] where T translates t; 

— translate mem{s) to: 

• struct G if struct G was generated during unification for this mem; 

• T if s = (0 : t) and T translates t; 

• struct G where G is a new struct definition laying out translated typed 
members of s at appropriate offsets (note this may require unions for 
overlapping members of s and may require padding members for unref- 
erenced struct elements). 
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6.1 Unwelcome choice in reconstructing arrays and structs 

The main problem which arises is due to the expressiveness (and defined storage 
layout) of C’s struct, union and array types compared to those of Java (which 
may only contain another such object via a pointer). As in Fortran it can be 
hard to distinguish array type int [10] [10] from int [100] . Similarly, arrays or 
structs containing other arrays or structs cannot in general be uniquely decoded, 
consider distinguishing objects xl, . . . x3 defined by: 

struct SI { int a; int b [4] ; int c; int d[4]; } xl; 

struct S2 { int a; int b [4] ; } x2 [2] ; 

struct S3 { struct S2 c,d; } x3; 

We are exploring various options for constraint resolution when array index- 
ing and struct selection occurs. Consider code like that discussed in section 4.3: 

Id.b 0 [rO] ,rl 

Id.b 48 [rO] ,r2 

Id.w (r5) [rO] ,r3 

There are several possible ways to approximate this data-structure from the 
above information, including: 

union T1 { char a [/*ATLEAST*/49] ; int b [/*ATLEAST*/17] } *r0; 
struct T2 { char a; char pad[3]; int b[15]; char c; } *r0; 

The latter is appealing in that additional information, e.g. a 

Id.b 16 [rO] ,r4 

instruction could cause natural, fuller-information, revision to 

struct T2 { char a; char pad [3] ; int b [3] ; } (*r0) [4] ; 

Finally, a current limitation is not exploiting stride information. For example, 
we could use information about the computation of rO to determine restrictions 
on sizes (and hence types) in the pointed-to values represented by r5 in instruc- 
tions like Id.w (r5) [rO] ,r3. 

7 Conclusions and further work 

We have described a system which can decompile assembler code at the RTF 
level to C. It can successfully create structs both intra- and inter-procedurally 
and in doing can generate code close to natural source form. 

We have not discussed the use of local variables stored on the stack rather 
than in registers. A simple extension can manipulate local stacks satisfactorily 
(essentially at the representative power of Morrisett et al.’s [8] Typed Assem- 
bly Language) when local variables are not address-taken. However, there are 
problems with taking addresses of local stack objects in that it can be unclear 
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as to where the address-taken object ends — a struct of size 8 bytes followed by 
a coincidentally contiguously allocated int can be hard to distinguish from a 
struct of size 12 bytes. 

Here it is worth remarking on the assistance given by proof-carrying code [9] , 
particularly when the proof has been generated by a compiler, to our decompiler. 
Many of the questions which gave difficulty for decompilation concerned issues 
like: where arrays live inside a struct, is the data-structure really an array of 
structs instead, or simply where a given array (determined by variable indexing) 
begins and ends. In general these are exactly the points which an accompanying 
proof-of-safety must address. This suggests that we can probably do much better 
at decompilation given the proof part — perhaps this shows that proof-carrying 
code is not a good idea for secret (or deliberately obfuscated) algorithms! 

Note that the decompilation process does not depend intrinsically on C. We 
chose C because of its ability to capture most sequences of machine instructions 
naturally; casts can also represent type cheating in reconstructed source form. 
It also provides a good balance of problem-statement and tractability for this 
initial work. Of course, there are instruction sequences which are not translatable 
to C — the most obvious example is that C does not have label variables and 
so jmp rO cannot be decompiled (except possibly as a tail- recursive call to a 
procedure variable). 

One could imagine a generalisation of this system where compiler transla- 
tion rules (e.g. for Haskell) are made available to the decompiler to reconstruct 
rather more high-level languages. Failure of code to match such rules would in 
general indicate a call to a native (in the Java sense) procedure or that the prof- 
fered code cannot be expressed in the source code represented by the translation 
rules. This clearly links to the “formal methods” view of reverse engineering 
discussed in the introduction for inventing higher-level (than C) notions for ex- 
isting code. Our type-based approach clearly can assist in this process in that it 
is coarser-grain than algebraic semantic approaches yet retains aspects of global 
understanding. Work in progress concerns identification and replacement of ab- 
stract data-types — in BCPL (a fore-runner of C) adjacent words in memory 
are required to have addresses differing by one which causes current translators 
for byte-addressed targets (whether via C or direct to target machine code) to 
generate large numbers of shift-left or shift-right by two instructions. Identify- 
ing “BCPLaddress” as an ADT and re-implementing it could eliminate all such 
shifts with the exception of those caused by users explicitly relying on this part 
of the standard. 

Finally, we turn to performance: we as yet have no experimental results® for 
large bodies of code, but the ability to reconstruct datatypes for both iterative 
and recursive procedures is appealing over other techniques. Since the process 
of data-structure reconstruction depends only on finding cycles which in reality 
are likely to be quite short even in large programs, we are optimistic about 
the scalability of the techniques. In common with several type-based systems, 
interprocedural versions seem to come naturally and without great cost. 



A student project is currently underway. 
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Abstract. We explore the hierarchy of control induced by successive 
transformations into continuation-passing style (CPS) in the presence 
of “control delimiters” and “composable continuations”. Specifically, we 
investigate the structural operatioual semantics associated with the CPS 
hierarchy. 

To this end, we characterize an operational notion of continuation seman- 
tics. We relate it to the traditional CPS transformation and we use it to 
accouut for the control operator shift and the control delimiter reset 
operationally. We then transcribe the resulting continuation semantics in 
ML, thus obtaining a native and modular implementation of the entire 
hierarchy. We illustrate it with several examples, the most siguificant of 
which is layered monads. 



1 Introduction 



1.1 Background 

Continuation-passing style (CPS) programs are usually obtained by CPS trans- 
formation. The CPS hierarchy is obtained by iterating the CPS transformation, 
which yields programs whose types obey the following pattern: 

Fun = Valo — > Conti — > Cont2 — > ... ^ Cont„ ^ Ans„ 

Conti = Vail ^ Cont2 ^ ... ^ Cont„ ^ Ans„ 

Cont„ = Val„ ^ Ans„ 

In the CPS hierarchy, programs exhibit the familiar pattern of success/failure 
continuations which is pervasive in functional specifications of backtracking. 
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Enriching CPS with identity and composition makes it possible to simulate 
Prolog-style backtracking and also to accumulate results. To simulate Prolog- 
style backtracking, we successively apply the current continuation to all possible 
choices — failing if there are none. To accumulate results, we use the current 
continuation to compute the result of the “remaining computation” and stash 
it away in an accumulator. To combine these two control mechanisms, e.g., to 
accumulate all the possible results of a non-deterministic computation in a list, 
we exploit the natural hierarchy between these two processes: the generation 
should take place in a context where its successive results are accumulated. We 
therefore CPS-transform the generation process (thus making its continuation 
continuation-passing) and we supply the accumulation process as its initial con- 
tinuation [5]. 

The CPS hierarchy thus offers a fitting platform to express hierarchical 
backtracking — at the price indicated by the types above: a quadratic inflation 
of continuations. 

The last level of CPS can be avoided by using the identity continuation and 
the ability to compose continuations. This is often deemed enough when n = 1 
or n = 2 in the type equations above. For higher values of n, this quadratic cost 
can be alleviated with a linguistic device: two new syntactic forms in direct style, 
whose CPS transformation yields the desired effect of initializing a continuation 
with identity and of composing continuations. Initializing a continuation with 
identity is achieved with the control delimiter reset. Composing continuations 
is enabled by the control operator shift which captures the current (delimited) 
continuation and makes it ready to be composed with a subsequent continu- 
ation [5,6]. For comparison, the control operator callcc captures the current 
(unlimited) continuation and makes it ready to replace a subsequent continua- 
tion [17,23]. 

The challenge now is how to implement the CPS hierarchy more directly than 
by repeated CPS transformations and more efficiently than with a definitional 
interpreter [5]. Filinski showed how to implement the first level natively, using 
callcc and one reference cell [12]. In this article, we show how to implement the 
entire hierarchy natively, using callcc and one reference cell per level. 

1.2 Related work 

The CPS hierarchy was identified and advocated by Danvy and Filinski [5], 
who also introduced the corresponding hierarchy of control operators shift„ and 
reset„ (one per surrounding continuation) . At the same time, but independently 
of CPS, Felleisen invented control delimiters [8], initiating a whole area of work 
on composable continuations and hierarchies of control [9,10,16,18,19,21,22,27], 
[31,33,34]. Control delimiters, for example, were instrumental to obtain a full- 
abstraction result [35]. 

All researchers in this new area followed Felleisen and defined their new 
control constructs operationally. They reported a variety of control operators, 
each of these displaying inventiveness in its modus operandi, its description, and 
its implementation. 
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In contrast, shift and reset are defined by translation into CPS. They have 
also proven particularly fruitful: because (we believe) control delimiters and com- 
posable continuations arise naturally in the CPS hierarchy, a number of applica- 
tions of shift and reset were reported through the 90’s [4,12,14,24,25,37], up to 
and including the R5RS [23]. There were only two further studies, however, of 
the CPS hierarchy and its guidelines: Murthy’s, formalizing its type system [28] , 
and Filinski’s, establishing its equivalence with computational monads [13,15]. 

1.3 This work 

The growing number of applications of shift and reset leads one to want to 
combine them. For example, suppose that we want to specialize programs that 
use shift and reset, using type-directed partial evaluation [4]. The problem is 
that type-directed partial evaluation also uses shift and reset, and we would 
like these two uses not to interfere with each other. This kind of applications 
require the CPS hierarchy to layer different uses of shift and reset at different 
levels. 

It is difficult to implement the CPS hierarchy natively, since the semantics 
of built-in constructs cannot be altered. We thus take a novel approach using 
operational semantics: we characterize an operational ‘continuation semantics’ 
and following Section 1.1, (1) we enrich it with identity and composition of 
continuation as provided by shift and reset, and (2) we transform the result 
in a new continuation semantics. The new semantics extends the old one with 
a new pair of shift and reset; moreover, it can be natively implemented in 
the old semantics, since all the rules in the new semantics except those of the 
newly added operators are those of the old semantics, with the addition of one 
unchanged component. Iterating this process yields a family of semantics — the 
CPS hierarchy — and its native and modular implementation in ML, a la Filinski 
[12,13,15]. This general approach provides a native implementation of the new 
language constructs. 

En passant, to make sure that we account for shift and reset as originally 
defined (i.e., by CPS transformation), we relate our operational notion of con- 
tinuation semantics with the traditional CPS transformation. 

1.4 Applications 

A toy example: The two following computations declare an outer context 1 + 
[ ] . They also declare a delimited context [50 + [ ] ] , which is abstracted as 
a function denoted by k. This function is successively applied to 0, yielding 50, 
and to 10, yielding 60. These two results are added, yielding 110 which is then 
plugged in the outer context. In both cases, the overall result is 111. 

1 + reset (fn 0 => 50 + shift (fn k => (k 0) + (k 10))) 

1 + let fun k V = 50 + V in (k 0) + (k 10) end 

In the first computation, the context is delimited by reset and the delimited 
context is abstracted into a function with shift. The second computation is the 
continuation-passing counterpart of the first one. 
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More substantial examples: The CPS hierarchy also makes it possible to express 
computations like min-max processes or quantifier alternation. For example, the 
existential quantifier 3v.p{v) for a condition p over non-deterministically gener- 
ated values V can be implemented as 
fun exist v p 

= shift (fn k => if (p v) then true else k ()) 
where the return value of reset corresponding to the collection is set to false. 
Similarly, we can implement the universal quantifier with a function forall. 
Using these two functions at different levels, we can write formulae of arbitrary 
quantifier alternation. 

2 Operational Semantics of the CPS Hierarchy 

This section presents a family of operational semantics that can be directly 
transcribed into an implementation. 

Starting with an operational semantics S for ML (Section 2.1), we character- 
ize an operational notion of continuation semantics (Section 2.2). We then relate 
continuation semantics and syntactic CPS transformation, which is a result in 
itself (Sections 2.3 and 2.4). Based on the CPS transformation, we provide a 
semantic account of shift and reset (Section 2.5). 

The semantics L, which is S extended with shift and reset, is no longer 
a continuation semantics. We induce two continuation semantics H and I, and 
prove that they both simulate the semantics L (Sections 2.6 and 2.7). Moreover, 
/ is directly implementable in S, in that the S'-rules embed into the /-rules with 
the addition of one unchanged component. This component can be implemented 
by a reference cell. As for the remaining /-rules, they correspond to new control 
operators which can be implemented as functions. 

The resulting semantics / is a continuation semantics, and thus, generalizing, 
we can iterate the whole transformation (Section 2.8). The resulting family of op- 
erational semantics formalizes the CPS hierarchy and is directly implementable 
in the initial operational semantics of ML (Section 3). 

2.1 Starting semantics S 

We use Harper, Duba, and MacQueen’s “continuation-based operational seman- 
tics” for ML [17] as our starting semantics S.^ Its syntactic categories are defined 
by the following grammar. 

e G Exp ::= a; | / | Xx.e \ cq ei — expressions 
V G Val ::= t \ Xx.e — values 

k G Cont ::= □ | ke \ vk — continuations 

Its inference rules specify a judgment of the form “fc h e w” which reads 
“under the continuation k, evaluating the expression e yields the answer ri” 
(Table 1). 

^ For brevity, both WRONG and LET rules in the original semantics are omitted here. 
The WRONG rule specifies the error case and serves in the formulation of type 
soundness. The LET rule is only there for Mb’s let polymorphism. 
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(VALO) 



V ^ V 



(VALl) 



D h V 

k \- vi ^ V 



(k^n) 



^ eg ^ V 
k \- eo ei V 



(eo not a value) 



(AR^)fcNa] H ^ . 

k \- Vq €-1 V 



(ei not a value) 



(BETA) 



k h 
k h 



[ui/a:]e v 
(Aa;.e) v± ^ v 



Table 1. An operational continuation semantics 



A continuation k can be thought of as an expression with precisely one “hole” 
□ in it. We write k[e] and k[k'] to denote the expression and continuation ob- 
tained by filling the hole in k with an expression e and a continuation k' , respec- 
tively, where k\k'] is the “composition” of k with k' . 

We are mainly interested in the dynamic semantics of shift and reset, and 
thus we do not present typing rules and the related soundness proof, which can 
be adapted from the work of Harper, Duba, and MacQueen [17] and of Gunter, 
Remy, and Riecke [16]. We rely on the static type system of our implementation 
language, ML, for the type soundness (Section 3). 



2.2 An operational notion of continuation semantics 

Operational semantics give rise to derivation trees. We define a branchless se- 
mantics as an operational semantics whose rules have one premise at most. 
Such a semantics gives rise to branchless derivation trees, i.e., lists. Note that 
a branchless semantics directly corresponds to a reduction semantics, where re- 
duction proceeds from the conclusion of a rule to the premise, or to the final 
result if the rule has no premise. Staying in the world of branchless evaluation 
semantics makes it easy to refer to a complete computation (as a judgment) as 
well as a single reduction (as a rule instance) . 

A continuation semantics, like the one in Table 1, is branchless: the contin- 
uation component in its judgments keeps track of the remaining branches in a 
corresponding direct-style derivation tree; it can be regarded as the stack used 
to traverse this derivation tree. 



2.3 CPS transformation 

Previous studies of the CPS hierarchy build on the CPS transformation. To 
justify our study of control operators with the continuation semantics of Table 
1, we adapt the call-by-value CPS transformation to its expressions, values, and 
continuations. 



X 
V 

[eoei] 



Exp — Xk.K X 
Exp = Xk.K |u] Val 

Exp — Xn-leoj Exp Xvo-leilExp Aui.uoWi k 



p, Val = (- 

{Xx.elvai = Xx.{elExp 
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. P- 

|fcei 
|t>o k 



\Cont = Xv.Xk.KV 


Cont — Xv.Xk. 


k 


Cont ^ '^'^^ 0 - 


Cont — Xv.Xk. 


k 


Cont ^ ■ 



Exp Xv\.Vq V\ K 
fol Val f 1 H 



The rationale for |-] cont is that the “hole” □ in the continuation is an ab- 
straction over a value, as captured in the following lemma, where “=/ 3 r;” denotes 
/3?7-convertibility. 



Lemma 1. For all k G Cont and v G Val, |fc[u]]£;a;j, 1^1 Cont [^1 VaC 

Proof. By structural induction on k. 



2.4 Soundness and completeness of the operational semantics 

The following theorem connects the continuation semantics of Section 2.1 and 
the CPS transformation of Section 2.3. 

Theorem 1. For all k G Cont, e G Exp, and v G Val, if k \~ e ^ v then 

lej Exp Xw-lkj Cont W Xa.a =pn Ivjvai- 

Proof. By rule induction on fc h e u. 

Corollary 1. For all e G Exp and v G Val, if O \- e ^ v then lejExpXw.w 
=/3v Ivj Val- 

Because the term leJsajj, Xw.w is convertible to a value |u] vai, it must reduce 
to a value that is equal to |u] vai modulo /3?7-conversion under normal-order eval- 
uation (by the Normalization theorem), and thus also under applicative-order 
evaluation (by Plotkin’s Indifference theorem [29]). The operational semantics 
is thus sound with respect to the call-by-value semantics defined by the CPS 
transformation . 

Proving completeness requires a close correspondence between the rules and 
the translated terms, and as often, “administrative redexes” in the CPS trans- 
formation get in the way. To prove completeness (i.e., that the evaluation of the 
CPS form of a term e with an identity continuation leads to a value v, then 
□ h e v), we successfully adopted Danvy and Filinski’s one-pass CPS 
transformation [6]. 

These soundness and completeness results are not surprising. One can also 
obtain them by proving the equivalence of the continuation semantics and a 
direct semantics, and then by using Plotkin’s Simulation theorem [29]. A more 
immediate connection between continuation semantics and CPS transformation, 
however, provides the basic framework for adding control operators. 



2.5 An operational account of shift and reset 

A control operator “reifies” a continuation k into a function fk . In terms of the 
CPS transformation, such a function appears as a A-term: 

Ak = Xw.lkjcontw Xa.a. 
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Correspondingly, in the continuation semantics, we need to find out how to 
invoke such a function fk to be able to use a reified continuation. 

Let us fix a continuation k. For any given value v, the corresponding v' such 
that k \- V ^ v' \s unique, if it exists. This suggests us to define a function fk 
for every continuation k as follows. 

fkV = v' <1=^ k \- V ^ v' 

And this is justified by the CPS transformation: as a corollary of Theorem 1 
when e = V, the term Ak (|u| vai) is /3?7-convertible to 

{Xw-lkjcontw Xa.a)lvjvai =f3v IvjExpXw-lkjcontwXa.a =pn Klvai- 

Now we are ready to introduce the control operators shift and reset by 
adding the following expression forms and the corresponding rules. 

e ::= ... | reset e | shift c.e | pushcc(e, k) 

Shift and reset are defined by their CPS transformation [5,6]: 

[shift c.e\Exp = A«:.(Ac.|e]£;a;j, Aa.a) Xw.Xk' .k' (kw) 

— composition of continuations 

[reset ejExp = Xk.k ([e]£;a;j, Xa.a) 

— identity continuation 

In both shift c.e and reset e, the type of e should be the same — though not 
necessarily the same as the final result type. Thus any shift-expression must be 
delimited by a corresponding reset or a shift — a hidden restriction that cannot 
be easily expressed in this translation. We address it in Section 2.6. 

The translation of shift and reset are not in CPS because they compose 
continuations. This programming pattern is abstracted by the meta-control op- 
erator pushcc: 

[pushcc(e, = Xk.k (lejExp Xw-lkjcontw Xa.a) 

Pushcc is a meta-control operator because its expression form contains contin- 
uations. It therefore can only be used to define other control operators. 

Correspondingly, in the operational semantics, we can express the composi- 
tion of functions fk and related to continuations k and k' by adding the rule 
pushcc (Table 2). As for shift and reset, they are defined by the rules shift and 
reset in term of pushcc. 

Let us resume the inductive proof of Theorem 1 for the three new rules. We 
only reproduce the most interesting one here, i.e., pushcc. 

Proof, (excerpt) 

The induction hypotheses for the rule pushcc (Table 2) read: 



lejExp Aw.[fc'] Cont W Xa.a =pn [f'l Vai 

{v'Iexp Aw.[fc] Cont W Xa.a =pn {kj Cont [w'l Val Xa.a [u] Val 



(11) 

(12) 
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(shift) 



□ h (Ac.e) Aw.pushcc(w, fc) v 



k h shift c.e 



(pushcc) 



k' h 



( reset 



fc h n' 



k h pushcc(e, □) ^ v 



k h reset e 



k h pushcc(e, k') 



Table 2. Definitional rules for shift and reset 



Now, we have |pushcc(e, Aw. 1^1 Cent w Aa. a 

=Pn {Xk.k (|e]Ba;p \w.\k'}cont w Xa.a)) Aw.|fc] cont w Aa.a 

=/3r, 1^1 Cont K1 Vai Aa. a, by (il) 

=/3r, Mvah by (i2) 

Theorem 1, and consequently Corollary 1, still hold for the operational se- 
mantics and the CPS transformation extended with the control operators. There- 
fore, the operational semantics gives the same definition for the control operators 
as those given by the CPS transformation. 

2.6 A continuation semantics for shift and reset 

With the addition of pushcc, the semantics is no longer branchless. To obtain an 
equivalent branchless semantics, we need to use the idea of the CPS transfor- 
mation, i.e., flattening derivation trees by remembering branching computations 
(continuations k in pushcc) in a stack. 

We thus induce another semantics H from the extended semantics, referred to 
as L. For clarity, we subscript rules and domains by the semantics they belong 
to. A judgment in H is of the form k \-h e u, where e G Expj^ (which 
can be one of the form introduced by the control operators), but k forms a 
new domain of “global” continuations: k G Contn = ContL x {ContL list) = 
Cont~l. The global continuations (of type Contn) are always non-empty lists of 
continuations, whose head is the current active continuation, and whose tail is 
a stack of saved continuations. 

The Ff-rules are given in Table 3. Most of them are simply the correspond- 
ing L-rules with the stack ks carried around unchanged. The interesting rules, 
pushcC|.i and VALO^^'^^, function as the branching rule pushccL. 

The semantics H is branchless. We would like to show that it correctly ac- 
counts for L, i.e., V fc, e, v. (fc Fl e v) {k :: nil \~h e v). 

Theorem 2. For all k G Contr, e G Expi^ and v G Valr, if k \~l e ^ v, then 
W ks £ Cont*j^, V G Valr- (n :: ks \~h v v) {k ks \~h e v). 

Proof. By rule induction on k \~l e ^ v. 

For ks = nil, using rule VAL0 h"“, we obtain the following corollary. 

Corollary 2. For all k G Contr, e G Expj^ and v G Valr, ifk\~Le^v then 
fc :: nil \-h e u. 
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, Mil , , rnMc, k :: ks \- h v' ^ v 

(VALOf^"-) rn (VALOS®*^^) 

n :: nil h v ^ v HH :: fc :: fcs hf 



-fj V’ ^ V 



, ^ D :: fcs \- H k\vi] v , r-i. , , fc[HII eil :: ks \- u ^ v . 

(VALIh)— {k ^ □) (FNh)— (eo not a value) 



k :: ks \~h vi ^ v 



, ^ . k\vn D] :: ks \~h ^ v , ^ , 

(ARGh) (ei not a value) (BETAh) 



k :: ks \~h cq ei ^ v 

k :: ks \- h v 



k :: ks \- h vq ei ^ v 

D :: fcs \- H (Ac.e) Ait?.pushcc(io, k)^v 

(shiftH) ; ; ttt;- — (reseth) 

fc :: KS \- H shirt c.e v 



k :: ks \- h (Aai.e)ui 

k :: ks \~h pushcc(e, HH) 
k :: ks \- h reset e ^ v 



(pushccn) — 



k :: k' :: ks \~h e ^ v 
:: ks \- H pushcc(e, fc) v 



Table 3. An operational continuation semantics for shift and reset 



For the inverse direction, we need to refer to the derivation more explicitly: 
we use ^ to denote the sub-derivation relation, and we write D \ J li D \s & 
derivation ending with the judgment J. 

Theorem 3. For all ks € Cont*j^, k € ContL, e € Expj^, v' € Valr, if D : 
k :: ks \~h e ^ v' , then 

3 G Valr, D' . {k \~l e ^ v) A {D' :< D) A D' \ (U :: ks \~h v ^ v')- 

Proof. By strong induction on the derivation D. 

For ks = nil in Theorem 3, we notice that the only possible derivation for D' 
is one-step, using rule VALO^"", so the witness v is v', and the following corollary 
holds. 

Corollary 3. For all k G Contr, e G Expj^ and v G Fo^L; if k :: nil \~h e ^ v 
then k \~L e ^ V. 

Together, Corollary 2 and Corollary 3 show that the branchless semantics F[ 
simulates the semantics L. FI can be implemented by a definitional interpreter 
as before [5]. Our goal, however, is to implement the control operators natively 
as functions (using first-class continuations and cells, like Filinski [12]). The 
semantics of the built-in constructs cannot be altered in such a setting, thereby 
preventing us to implement the crucial rule VALOh®'^^, which is enacted when the 
active continuation (of type Contr) is already identity. Such behavior should be 
put into the continuation: instead of initializing it with an identity continuation 
□ , we should initialize it with an operation to resume the top continuation from 
the stack. The corresponding transformation of L is described in Section 2.7. 

Now we can come back to the typing problem of shift. The requirement that 
all shift-expressions must be enclosed by a corresponding reset or shift can 
be easily manifested in the semantics H by adding a side condition to the rule 
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(VALO,)^ 



ks \- 1 V ^ V 



^fc[HHei] :: ks h/ eg ^ "i 
k :: ks hj cq e\ ^ v 



n :: ks \- 1 -j 

k :: ks \- 1 v\ ^ v 

V 

— (eo not a value) 



(k^O) 



D] :: fcs l-j ei ^ i; /ccta ^ ^ y. ks hj [«i/a:]e ^ v 

(ARGi) — ; ; ; (ci Hot a value) (BETA:)- ; ; — 

k :: ks \- j vq e± ^ v k :: ks h/ (Aai.e) v± ^ v 



:: ks hj (Ac.e) AiD.pushcc(i(j, fc) 
k :: ks h/ shift c.e v 



^k :: ks hj pushcc(e, ^ v 

k :: ks h/ reset e ^ v 



{ks ^ nil) 

k :: k' :: ks \- j e ^ v 



:: ks \- 1 pushcc(e, fc) ^ v 



ks \- 1 V ^ v' 

(popcc,)— J . . _ 7 (fcs^nil) 

k' :: ks hj popcc(u) v' 

Table 4. An implementable semantics for shift and reset 



shifty that the stack should not be empty. (This makes the definition of shift 
partial; when the stack is empty, we obtain a run-time error.) 



(shiftH) 



□ :: ks \~H (Afc'.e) Aw.pushcc(w, fc) v 
k :: ks \~h shift k'.e ^ v 



{ks yf nil) 



2.7 An implementable continuation semantics for shift and reset 

The implementable semantics I is also induced from the semantics L with Contj 
= Contn = Cont~l. We introduce a new operator popcc: 

e ::= ... | popcc(e) 

Intuitively, this operator pops the continuation stack and sends its operand 
to the popped continuation. Now, the initial continuation can be defined as 
i^pop =^popcc(D), which replaces □ in rules shifty and reset|_|, thus eliminating 
the need for the rule VALOh^'^^. 

In Table 4, the /-rules are the same as the //-rules except for VAL0| (replacing 
VAL0[lj"“ and VALO^^'^^), shiftj, reset,, and popcc,. 

Semantics / simulates semantics L, as shown by the following theorem. 

Theorem 4. For all e G Exp^^ and v G Valr, (Cl :: nil \-j e ^ v) 

(□ :: nil \-h e ^ v). 

Proof. By two straightforward inductions. 

This new semantics I has two properties: 

(1) It is branchless. More specifically, for any intermediate judgment k h/ 
e w in a proof tree, the rest of the computation after e is evaluated is totally 
captured in the global continuation k. We can thus iterate the above process to 
add control operators at subsequent levels. 
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(2) It can be directly implemented in the starting semantics S using refer- 
ences and first-class continuations in the following way:^ the head of the global 
continuation k is the current continuation, while the tail is stored in a reference 
cell. All the S'-rules automatically ‘extend’ to the corresponding /-rules, without 
touching the reference cell; the new rules defining the control operators can then 
be directly implemented by encoding the four constants shift, reset, pushcc 
and popcc as functions, using callcc to capture the current continuation and 
throw to restore it. 

2.8 An inductive construction of the CPS hierarchy 

The semantic transformation (from Section 2.5 to Section 2.7) can be generalized 
and iterated: at each step, we transform an input semantics Si into an output 
semantics 5^+1 that preserves certain inductive conditions (such as “branch- 
lessness” ) . The operational continuation semantics displayed in Table 1 satisfies 
these inductive conditions, and we used it as the starting semantics ^i. 





+shifti /reset i 



— branchless 



Hi 



\U ^ Si. 



+1 



— branchless & directly implementable 
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Fig. 1. A single transformation (from level i to level z -I- 1) 



Figure 1 summarizes the development of a single transformation: we start 
with the branchless semantics St of level z. Adding shifty and resets with re- 
lated inference rules yields the semantics Li which is no longer branchless (see 
Section 2.5). Then we replace the global continuation of this level by a stack 
of such continuations, which forms the global continuation of the next level, 
and we obtain a semantics Hi where we restore the branchlessness property (see 
Section 2.6). Since Hi is not directly implementable in Si, we apply another 
transformation to Li to obtain a semantics A, which is both branchless and di- 
rectly implementable (see Section 2.7). Semantics A simulates Hi, which in turn 
simulates Li. With its newly introduced control operators shifty and reseti, A 
satisfies the inductive conditions. Therefore, we can use it as the semantics Sj+i 
for the next level of the hierarchy. 

^ We did not put references and callcc in the starting semantics and we only use them 
for the implementation. In fact, making references available to the user causes no 
problem, whereas callcc interferes with shift and reset. 
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For space reasons, the rest of this section is omitted. But it is available in 
the extended version of this article [7]. 

3 An Implementation of the CPS Hierarchy 

We implement the CPS hierarchy by transcribing the transformation of Section 
2.8 in Standard ML of New Jersey [1], using the structure SMLofNJ.Cont which 
provides callcc and throw. The implementation uses a signature SHIFT. RESET to 
specify the operations provided by a semantics S for the user and for the con- 
struction of the next control level, a structure innermost JLevel to model the first 
level in the hierarchy (which thus provides no control operators), and a functor 
sr.outer to construct next-level control operators, parameterized by the answer 
type ans and by the control level inner immediately preceding it. Essentially, the 
signature SHIFT jlESET corresponds to the inductive conditions for the semantics 
S, and the functor sr.outer corresponds to the transformation from a semantics 
S = Si to the next-level semantics I = Si+i . 

The implementations of control operators are thus hidden inside the module 
system, and they are accessed via the name of the structure that corresponds 
to their level. Having devised an ordering of the control effects, a user then 
implements it through the order of functor applications. Hierarchical occurrences 
of shift and reset are thus no longer referred to by their relative index, which 
had been criticized in the literature [16,27]. 

We implemented the functor sr.outer by transcribing line-by-line the added 
semantic rules in semantics I (four new functions, one per operator) and the 
definition of the constant idFff^ We also use two auxiliary functions: a func- 
tion repIace.gcont, used implicitly in the semantics, captures and replaces the 
current global continuation, and a function cont2gcont, required by the induc- 
tive conditions for semantics S, converts a first-class continuation to a global 
continuation. The code is thus very concise: the pretty-printed program defining 
innermost Jlevel and sr.outer takes about 40 lines of ML code (Figure 3). 

We also provide a functor for the usual first level of control operators (shift i 
and reseti): 

functor initial.control.level (type ans) : SHIFT.RESET 

= sr.outer (type ans = ans structure inner = innermost.Ievel) 

Specializing this functor for the first level of the CPS hierarchy yields a result 
similar to Filinski’s implementation of shift and reset [12]. The main difference 
is that here we use an explicit stack of continuations whereas Filinski uses an 
implicit one through functional abstraction. (An analogy: one can represent en- 
vironments in an interpreter as a list or as a function.) 

® We use the function SMLof NJ . Cont . isolate to coerce a non-returning function to a 
continuation. This function can be defined as follows. 

fun isolate f = callcc (fn x => f (callcc (fn y => throw x y))) 
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signature SHIFT_RESET (* control level i *) 

= sig 

type answer (* answer type of level i *) 

val reset : (unit -> answer) -> answer 

val shift : ((’a -> answer) -> answer) -> ’a 

type ’a gcont (* Contg (= Cont\^) *) 

val replace_gcont : ’a gcont -> (’b gcont -> ’a) -> ’b 

(* captures current global continuation (of type 'b gcont), *) 
(* and replaces it with the first argument (of type 'a gcont) *) 
val cont2gcont : ’a cont -> ’a gcont (* Contg — ^ Contg *) 

end 



(* answer type of level i *) 



(* Contg (= Cont\^) *) 



structure innermost_level :> SHIFT_RESET 
- struct 

exception InnermostLevelNoControl 
type answer = unit 
type ’a gcont = ’a cont 



(* level 0 *) 

(* here, global continuation = ML continuation *) 



(* uses ML continuation for Conf^ *) 



fun replace_gcont new_c e_thunk 

- callcc (fn old_c => throw new_c (e_thunk old_c)) 
fun cont2gcont c = c 

fun reset _ = raise InnermostLevelNoControl 
fun shift _ = raise InnermostLevelNoControl 
end 

functor sr_outer (t 3 rpe ans structure inner: SHIFT_RESET) :> SHIFT_RESET 

where type answer - ans (* from S — Si to I — Si+i *) 

- struct 

exception MissingReset 
exception Fatal 
type answer - ans 

type ’a gcont {* Contj — Cont^ *) 

= (answer inner. gcont) list * ’a inner. gcont 
val stack = ref [] : (answer inner. gcont) list ref (* ks *) 



fun replace_gcont (new_ks, new_k) e_thunk 

(* captures and replaces the global continuation, recursively *) 
= inner .replace_gcont new_k 

(fn cur_k => let val cur_gcont = (! stack, cur_k) 

in stack := new_ks; (e_thunk cur_gcont) end) 

fun cont2gcont action 

= ( [] , inner. cont 2gcont action) 

fun popcc V (* rule popcC| *) 

= case ! stack of (* side condition {ks ^ nil) *) 

[] => raise Fatal 

I k’::ks => (stack := ks; inner . replace_gcont k’ (fn _ => v)) 
val id_popcc = inner . cont2gcont (isolate popcc) (* *) 

fun pushcc k e_thunk (* rule pushcq *) 

= inner .replace_gcont k 

(fn k’ => (stack := k’ :: (stack; e_thunk ())) 
fun reset e_thunk (* rule reset, *) 

= pushcc id_popcc e_thunk 

fun shift k_abstraction (* rule shift, *) 

= case (stack of (* side condition {ks ^ nil) *) 

[] => raise MissingReset 
I _ => inner. replace_gcont id_popcc 
(fn (k : 'a inner. gcont) 

=> k_abstraction (fn w => pushcc k (fn () => w))) 



Fig. 2. A native implementation of the CPS hierarchy in Standard ML of New Jersey 
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4 Application: layering monadic effects 

As a significant application of composable continuations, Filinski’s work on 
adding user-defined monadic effects to ML-like languages by monadic reflec- 
tion shows that composing continuations is a universal effect, which can be used 
to simulate all effects expressible using a monad [12,13]. The original work only 
allowed one monadic effect, but recently, Filinski has extended the technique to 
allow layering effects by relating a heterogeneous tower of monads to a tower of 
continuation monads, and then implementing them using a collection of cells to 
hold the meta-continuations [15]. 

Independently, we directly adapted Filinski’s original one-level implementa- 
tion with minimal changes to parameterize the functor that generates a monad 
representation by the monad representation layered beneath it, which also gives 
an inductive implementation of a monadic hierarchy. We essentially put in each 
structure of a monad representation the corresponding level in the control hier- 
archy; the functor that generates an outer monad representation is passed the 
control level of the monad representation at the inner layer, and applies functor 
sr.outer to construct its own control level. 

The benefits of this representation of layered monads is the same as in Filin- 
ski’s work [15]: it is a direct implementation, i.e., no level of interpretation and 
no level of translation hinder it [26,36]. 

More detail and several illustrative examples are available in the extended 
version of this article [7] . 

5 Related Work 

5.1 Felleisen’s seminal work 

As already mentioned in Section 1.2, the notion of control delimiters in direct 
style is due to Felleisen [8]. As already pointed out by Danvy and Filinski [5], 
control delimiters are significant because they fit in each level of the CPS hi- 
erarchy very naturally: they correspond to resetting the current continuation 
to the identity function; and indeed the control delimiter reset is equivalent 
to Felleisen’s. As for abstracting control, programming practice suggested the 
control operator shift which is equivalent to one of the variants of Felleisen’s 
iF-operator. 

Felleisen’s work relies on a notion of control stack, and has inspired a number 
of similar control operators. Danvy and Filinski’s work relies on CPS, and has 
inspired a number of applications, for two compound reasons we believe: 

Expressiveness: Programming intuitions run strong in the world of control stacks. 
But lacking guidelines, how does one know, e.g., whether one has landed on [the 
continuation equivalent of] Algol 60’s control stack or on Lisp’s control stack — 
i.e., on the control equivalent of lexical scope or on dynamic scope (whichever 
may be best)? And how does one use the result? 

Conversely, the world of CPS is a structured one, which offers guidelines 
and holds much untapped expressive power. For example [12,13], Filinski has 




238 Olivier Danvy and Zhe Yang 



shown that the expressive power of the CPS hierarchy is equivalent to the one of 
computational monads. In fact, our new examples could equally well be expressed 
using a tower of monads. 

More specifically, operational descriptions of control hierarchies offer the pos- 
sibilities to shadow control delimiters, to capture them or not when abstracting 
control, to restore them or not when reinstating abstracted control, and to dy- 
namically search through them at run time. CPS shields us against the most 
extravagant of these mind-boggling possibilities, since by definition, programs 
with shift and reset denote CPS programs. These CPS programs may have 
many layers of continuations, but they are (1) purely functional and (2) stati- 
cally typed. 

Efficient implementation: A stack-based implementation of control tends to ex- 
ert a cost which is linear in the use of each captured continuation. Besides, and 
this is a well-known thesis in the continuation community [3], it faces a real 
problem of duplicated continuations. 

Therefore alternative implementations have been sought. For example, Fil- 
inski already showed that shift i and reset i can be implemented concisely in 
terms of callcc, which itself can be implemented efficiently [3,12,20]. Through 
an alternative (but equivalent) formalism, our work essentially generalizes this 
concise implementation to the whole CPS hierarchy, with no new cost and an 
equivalent use. 

5.2 Filinski’s work 

As a significant application of the CPS hierarchy, Filinski’s work on adding user- 
defined monadic effects to ML-like languages by monadic reflection shows that 
composing continuations is a universal effect, which can be used to simulate all 
effects expressible using a monad [12,13,15]. 

5.3 Gunter, Remy, and Riecke’s work 

Gunter, Remy, and Riecke present a new set of control operators generalizing 
exceptions and continuations, and its associated operational semantics and type 
system [16]. The strength of these operators lies in their static type system — 
in comparison, and even though we do not doubt that there is one for the CPS 
hierarchy (cf. Murthy’s work [28]), we do not present one here explicitly; instead, 
we rely on MB’s type system in our implementation. 

Independently of their type system, Gunter, Remy, and Riecke’s operators 
are not cast in stone. In their own words, “We do not feel, though, that there is 
a clear answer to the question of which operational rule is right; suffice it to say 
that we have picked one, and that the other rules lead to strong type soundness 
as well.” Similarly, we do not contend that shift and reset are the ultimate 
control operators — Filinski’s operators kreflect and kreify, for example, could 
well be preferred [12,13]. But we do believe that the key to their simplicity and 
expressiveness is the CPS hierarchy. 

Gunter, Remy, and Riecke’s operators are also implemented with callcc. 
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5.4 Operational semantics 

Operational semantics, especially small-step reduction semantics, is often used 
to specify control operators formally. Several researchers have investigated the 
type soundness of languages with control operators via syntactic approaches 
based on operational semantics, such as Wright and Felleisen, and Harper, Duba, 
and MacQueen for first-class continuations [17,39], Gunter, Remy, and Riecke for 
generalizing exceptions and continuations [16], and Murthy for the CPS hierarchy 
[28]. Here, we use operational semantics to derive our implementation and to 
prove its correctness. Also, matching the CPS hierarchy, we present a family of 
continuation semantics instead of one monolithic semantics. This family can be 
natively programmed in ML without resorting to an informal notion of control 
stack. 

5.5 Continuations 

After 25 years of existence [32] , continuations still remain a challenging topic, to 
the point that ad-hoc frameworks are routinely preferred. For example, we find 
it significant that alternative and independent solutions were sought to compile 
goal-directed evaluation [30] and to abstract delimited control [8,16], even though 
two levels of continuations provide a simple, natural, and directly implementable 
solution to both problems. This indicates that continuations require more basic 
research. We have tried to contribute to this research by characterizing a specific 
notion of operational continuation semantics and by formalizing its connection 
to the traditional CPS transformation. 

6 Conclusion 

The CPS transformation is ubiquitous in many areas of computer science, includ- 
ing logic, constructive mathematics, programming languages, and programming. 
Iterating it yields a concise and expressive framework for delimiting and abstract- 
ing control — the CPS hierarchy — which appears substantial and fruitful but has 
been explored very little so far. In this article, we have contributed to exploring 
it by (1) characterizing an operational analogue of continuation semantics; (2) 
developing an analogue of the CPS transformation for such an operational con- 
tinuation semantics; (3) making it account for the family of control operators 
shift and reset; (4) providing a native implementation of the CPS hierarchy in 
the statically typed language Standard ML; and (5) illustrating the implemen- 
tation both with classical and with new applications, and in particular with a 
direct implementation of layered monads. 
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Abstract. Run-time code generation (RTCG) and just-in-time compi- 
lation (JIT) are features of modern programming systems to strike the 
balance between generality and efficiency. Since RTCG and JIT tech- 
niques are not portable and notoriously hard to implement, we propose 
code splicing as an alternative for dynamically-typed higher-order pro- 
gramming languages. Code splicing combines precompiled pieces of code 
using higher-order fnnctions. While this approach cannot achieve the 
performance of compiled code, it can support some intriguing features: 

— very fast “compilation” times; 

— satisfactory run times, compared with interpretation; 

— simple interfacing with compiled code; 

— portability. 

Starting from implementation models for functional languages we de- 
velop and evaluate several approaches to code splicing. This leads to 
some new insights into compilation techniques for functional program- 
ming languages, among them a compositional compilation schema to 
SKI-combinators. The progression of different techniques sheds some 
light on their relationship, specifically between combinator-based imple- 
mentations and closure-based implementations. 

All techniques have been implemented and evaluated in Scheme. 



1 Introduction 

Run-time code generation and just-in-time compilation generate code at run time 
and execute it subsequently. To amortize code generation time, RTCG and JIT 
only perform simple optimizations. They can still generate competitive code by 
exploiting invariants that are only available at run time. On the flip-side these 
techniques are inherently non-portable and hard to implement because many 
technical details and pitfalls must be catered for (flushing instruction caches, 
memory management, access permissions, and so on). 

However, there is some indication that RTCG is not always worth the effort. 
Indeed, Lee [17] stated that restructuring the code to observe a staging discipline 
already achieves signiflcant speedups. So why go into the complications of RTCG 
or JIT if some of the speedup is already available just by staging? 

* This work has been done while at the School of Computer Science and Information 
Technology, University of Nottingham, UK. The author acknowledges support by 
EPSRC grant GR/M22840 “Semantics of Specialization”. 
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This is exactly the motivation for code splicing. Code splicing is a range 
of techniques for writing staged interpreters. Such an interpreter keeps compile- 
time computations separate from run-time computations. Applying it to a source 
program ideally performs all compile-time computations and returns the com- 
piled (or code-spliced) program. It is not necessary to actually build a compiler. 
This style of “compilation” is attractive for a range of applications: 

— The execution of applets and other mobile code: receive high-level code from 
a network to perform integrity checks, but execute it efficiently. 

Many applets are throw-away code which is only executed a very limited 
number of times. Even JIT compilation may be too expensive [2]. 

— The implementation of flexible domain-specific languages. 

The emphasis is on quickly designing, implementing, and (possibly) modi- 
fying the language. Writing a full-blown compiler would be too expensive 
because the user community for such a language is often small and efficiency 
is not of tantamount importance. 

— Efficient metaprogramming. 

Metaprogramming systems [20] allow the generation of high-level program 
code and its subsequent execution in the same running program. Ideally, 
there should be no penalty for using generated code, but full compilation 
would be too slow. The approaches to combining partial evaluation and 
compilation also fall into this category [19,5]. 

— Overcoming restrictions of compilers. 

Some compilers have arbitrary restrictions on the size of code, the number 
of variables, and so on. Code splicing provides a way to compile and execute 
arbitrary programs, overcoming implementation restrictions. 

All these applications share the following requirements: 

1. Instantaneous compilation. The time for code splicing should be comparable 
to the time necessary to construct the corresponding source text. 

2. Satisfactory speed. Code-spliced programs should be significantly faster than 
interpreted ones. 

3. Easy interfacing. It should be possible to freely mix code-spliced and ordi- 
narily compiled code. 

4. Portability. 

The present paper investigates several code splicing techniques through imple- 
mentations in the higher-order functional programming language Scheme [15]. 
All techniques are portable and most of them meet all of the requirements. They 
exploit many features of Scheme, e.g., dynamic typing, side effects, and the eval 
function. 

The techniques are inspired by implementation techniques for functional pro- 
gramming languages (SKI-combinators, director strings, categorical combina- 
tors, closure conversion, and shallow binding). Most of them can be made to 
exhibit good staging properties. However, there are some surprises. For exam- 
ple, it turns out that the standard compilation method to SKI-combinators [21] 
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ASyntax type of generated code (active syntax) 

make-var : Var ^ ASyntax 

make-leim : Var X ASyntax^ ASyntax 

make-app : ASyntax X ASyntax^ ASyntax 

compile-term : ASyntax ^ Val 

Fig. 1. Signature of code generation 



is unsuitable for our purposes because it is not compositional. Therefore, we 
devise a compositional compilation scheme to SKI-combinators and prove some 
correctness properties about it. Deforestation [22] makes it reasonably efficient. 



Outline: Section 2 introduces our experimental setup. Section 3 explains the 
different approaches to compilation that we consider. Section 4 presents com- 
parative run times. Section 5 discusses related work and Section 6 concludes. 
Throughout the paper, we assume knowledge of the Scheme language [15]. 



2 The experimental setup 

Type-directed partial evaluation (TDPE) [7] is our dynamic supply for program 
text. When applied to a value of type r and a representation of the type r, 
it constructs a pure lambda term of that type. To “compile” the constructed 
terms using code splicing, it is sufficient to provide staged interpreters for the 
lambda calculus. We distribute the implementation of such an interpreter over 
the syntax constructors for the lambda calculus to avoid the cost of the syntax 
dispatch. That is, the make-var function interprets a variable expression, the 
make -lam function interprets a lambda abstraction, and make-app implements 
function application, taking the interpreted subexpressions as parameters. Fig. 1 
defines the types of these functions. Each interpreter provides its own definition 
of the type ASyntax (for active syntax) and the functions listed, the constructors 
for this type. The implementation includes multi-argument versions make-lam* 
and make-app* of lambda abstraction and application, as well as make-let. 



3 Approaches to code splicing 

The following subsections introduce different ways to achieve code splicing. Each 
choice distributes compile-time work between the active syntax constructors and 
compile-term in a different way. In one extreme (see Sec. 3.1), the active syn- 
tax constructors construct Scheme expressions and compile-term performs full 
compilation. In the other extreme [19], the active syntax constructors emit and 
combine byte code and compile-term is the identity function. 
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(define (make-VEir x) x) 

(define (make-lam x e) ‘(LAMBDA (,x) ,e)) 

(define (make-app f a) ‘(,f ,a)) 

(define (compile-term e) (eval e (interaction-environment))) 
Fig. 2. Active syntax constructors for Scheme 



3.1 Using eval 

The Scheme standard [15] includes a function eval that maps a Scheme expres- 
sion to its value. Hence, the syntax constructors can construct Scheme expres- 
sions (using quasiquote and unquote and compile-term can be eval 
(see Fig. 2). 

Unfortunately, this implementation does not fulfill all of our requirements: 
compilation speed is heavily implementation and system dependent. Further- 
more, the compiled code is likely to uncover implementation restrictions of the 
underlying language implementation. 



3.2 SKI combinators 

This approach takes advantage of a precompiled library of implementations of 
the combinators S, K, and I. After compiling the source program to a combi- 
nator term, an interpreter just sticks the precompiled combinators together as 
prescribed by this term. 

Unfortunately, the naive compilation generates abysmal code and requires 
multiple passes. An optimized compilation [21] generates better and smaller 
code, but it still requires multiple passes. However, we can do better. 



Compositional compilation to combinators Some research has gone into 
optimized combinator systems that keep the resulting terms small [21]. If we 
adopt the additional combinators B and S' defined by 

B = (leunbda (x) (lambda (y) (lambda (z) (x (y z))))) 

S' = (Icimbda (k) (lambda (x) (lambda (y) (lambda (z) ( (k (x z)) (y z)))))) 

then there is a compositional specification of compilation. Let us consider the 
three constructs in turn for ASyntax = CEnv ^ SKI where the compile-time 
environment CEnv = Var* is the list of the bound variables in the reverse order 
in which they were bound and SKI is the set of combinator terms. 

make-var. An access to the zth variable in an environment of size n com- 
piles into a projection function that returns the zth argument out of n (where 
0 < z < rz and n > 0): \xq . . .Xxn-i-Xi. This function can be expressed as 
A'(*)((HA')("-*-i)(/)) where = I and = BX^^X is the iterated 

composition of X (proof by case analysis and induction) . Using the equivalent 
definition = Y and = A*^*)(AT), we get for z = 1 and rz = 3 the 

combinator term K{BKI). 
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make -lam. Compilation just needs to update the compile-time environment with 
the new variable. The translation of the body provides the additional abstraction. 

make-app. If n is the size of the environment, the combinator 

Snf^ — ^^0 ■ ■ ■ 1 . (/*Xq . . . Xti— i) (nXo . . . Xti— i) 

distributes n values to the compiled subterms / and a of the application. Rewrit- 
ing this combinator using only S, K, and I leads to an explosion of the size of 
the term. However, the S' combinator was conceived to solve this problem. It is 
easy to show that, for all n > 0, (proof by induction). 

Assessment The resulting combinator terms can be quadratic in the size of the 
source term (because the size n of the environment is only bounded by the size 
of the source term), hence the compile time is also at least quadratic. 

In addition, the granularity of the compiled code is too fine, leading to dis- 
appointing performance. 

Finally, dealing with multi-argument functions is awkward. The standard 
solution [10, sec. 12.2.3] is to introduce an untupling combinator 

U = (lambda (f) (lambda (x . xs) (apply (f x) xs))) 

and implement the translation of multi-argument abstractions accordingly. 

Director strings The key idea of this approach is the following invariant: 

pass only the values of the free variables to the compiled expression. 

For this optimization, we need further combinators B' , C' , and C" defined by 

B' = (lambda (k) (lambda (x) (lambda (y) (lambda (z) ( (k x) (y z)))))) 

C' = (leunbda (k) (laimbda (x) (lambda (y) (lambda (z) ( (k (x z)) y) ) ) ) ) 

C" = (lambda (k) (lambda (x) (lambda (y) (lambda (z) ( (k (x y) ) z))))) 

During compilation, the compiler trims the environment according to the free 
variables while keeping their order. This idea corresponds exactly to director 
strings [16], where S", B' , and C correspond to A, \, and /. The make-ccxx 
functions compute the free variables on the fly at compile time. 

(make-var x) . By the invariant, only the value of x is passed to this compiled 
expressions. Hence the translation is I. 

e = (make-lam x b) . If x occurs free in b then the free variables of b are 
exactly x and the free variables of e: nothing needs to be done. 

If X does not occur free in b then the free variables of b are identical to the 
free variables of e. In this case an additional abstraction must be conjured up. 
If n is the number of free variables and A is the compiled body then C"^"‘'^KA 
does the job, since, for all n > 0, 

Axo . . . Ax„.Hxo . . . x„_i = C"^"'^KA. 
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(define (make-VEir x) 


(define (make-app el e2) 


(Icimbda (env) 


(lambda (env) 


(let* ( (n (length env)) 


(let* ((cl (el env)) 


(i (index x env))) 


(c2 (e2 env)) 


(projection i n) ) ) ) 


(n (length env))) 




( (application n) cl c2)))) 


(define (make-lam x e) 




(Icimbda (env) 


(define (compile-term el) 


(e (cons X env)))) 


(el ’())) 


Fig. 3. Active syntax constructors: customized combinators 



e = (make-app f a) . This case involves bookkeeping of the free variables of 
f , a, and e. Let Xi, . . . , be the free variables of e in the order of their bindings 
and compile to Xi{X 2 ■ ■ ■ {XnI))F A, where F and A are the compiled terms for 
/ and a and, for 1 < z < n, 

- Xi = S' if Xi G FV{f) n FT(a); 

- X, = C ifx, GFy(f)\Fy(a); 

- X, = B' if X, G FV{s) \ FV{f). 

Deforestation Our implementation uses deforestation twice to eliminate interme- 
diate results. First, the active syntax construction is re-interpreted as compila- 
tion to a combinator expression. Second, the syntax constructors of the resulting 
combinator expression (combinator constants and application) are re-interpreted 
as the compiled values of the combinators and as function application. 



3.3 Customized combinators 

A possible improvement to using SKI combinators lies in providing precompiled 
projection functions (projection i n) (with n > 0 and 0 < i < n) and a 
generalized S combinator (application n) that distributes n > 0 arguments. 

(define (projection i n) 

(lambda (xO) ... (lambda (xn-1) xi))) 

(define (application n) 

(Icimbda (f) (lambda (a) 

(lambda (xO) . . . (lambda (xn-1) 

( ( (f xO) . . . xn-1) 

((a xO) ... xn-1))))))) 

With this approach, the active syntax constructors perform the entire compila- 
tion, as shown in Fig. 3. The type of the active syntax is still CEnv — > Val where 
CEnv is as before. The translation of a variable selects a projection using the size 
of CEnv and the index of the variable in it. The translation of a lambda abstrac- 
tion only pushes the new variable onto the environment, it does not generate 
code. Make-app compiles its subexpressions first. Then it splices the two pieces of 
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(define (make-VEir x) 


(define (make-app el e2) 


(Icimbda (env) 


(Icimbda (env) 


(let ((j (index x env))) 


(let* ((cl (el env)) 


(Icimbda vs 


(c2 (e2 env))) 


(list-ref vs j))))) 


(Icimbda xs 


(define (make-lam x e) 


((apply cl xs) 
(apply c2 xs)))))) 


(Icimbda (env) 


(let ((c (e (cons x env)))) 


(define (compile-term e) 


(Icimbda vs 


((e ’()))) 


(lambda (y) 

(apply c (cons y vs))))))) 


Fig. 4. Active syntax constructors: 


multi-argument combinators 



code together to an application using application. The function compile-term 
initiates compilation by applying the value to the empty environment ’ (). 

This compilation scheme exhibits linear time behavior in practice. Asymp- 
totically it is quadratic due to the linear scans through the environment, whose 
length is also bounded by the size of the expression. 

There are at least three ways to implement this approach in Scheme. 

1. Generate the text of the projection and applicationfunctions as required, 
compile them using eval, and cache the results. 

2. Use generic versions of projection and application in Scheme. 

3. A mixed approach provides the precompiled versions up to some fixed no, 
falling back to a generic implementation for n > Uq. 

A drawback is the complicated treatment of multi-argument functions which 
must be resolved in the same way as explained in Sec. 3.2. 



3.4 Multi-argument combinators 

The combinator approaches described in the preceding sections represent the 
run-time environment implicitly using abstractions “built into” the combinators 
S, K, and I, and generalizations thereof. This representation corresponds roughly 
to a list of the values of the variables, i.e., a nested representation of closures. It 
is inefficient because the free variables are passed one by one at each application 
and variable access. 

The next logical step consists of flattening these structures and passing them 
around using multi-argument functions. With this approach, one application 
passes the entire run-time environment. Consequently, the type of the active 
syntax is env : Var*.Vall®°-^l Val where |env| is the size of the environ- 
ment. The staging is important: compilation executes the env : Var* . . . part, 
whereas the Vall®^^l — > Val part is left till run-time. 

Figure 4 shows the interpretation of the constructors. The variable case com- 
putes the index j of the variable in the run-time environment and returns the 
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generic projection function (lambda vs (list-ref vs j)) which maps a tu- 
ple of size at least j+1 to the value of its jth component. Make -lam transfers 
the argument value y into the run-time environment. Effectively, it curries the 
compiled body c of the abstraction. Make-app is a generalized S combinator. 

It is intriguing to see that we can directly map to compiled code so that 
not much work is left for compile-term: it applies active syntax to the empty 
environment and runs the resulting thunk (a parameter less function). 

The code uses generic implementations of the projection and application 
functions (which is viable due to the use of the flat representation of the run- 
time environment). It is again possible to generate customized versions for each 
particular arity and cache their compiled code as explained above. In this case, 
we also need currying functions for different sizes of the run-time environment. 
To our surprise, we found that the cost of caching was higher than the cost 
of using the generic implementation. We suspect that checking the number of 
arguments for functions of fixed arity is the culprit: the generic implementation 
uses variadic functions that do not check the number of their arguments. 



3.5 Explicit linked-list run-time environment 

Another variation is inspired by the implementation technique of the categorical 
abstract machine [6]. Standard expository texts [11] employ the same technique. 

The type of the active syntax is CEnv ^ REnv ^ Val. The representation of 
the run-time environment REnv is a linked list of vectors. Each vector holds the 
values of the variables abstracted by one surrounding lambda. The compile-time 
environment maps a variable to a depth and an offset, the depth determines the 
index in the outer linked list and the offset determines the position of the vari- 
able’s value in the vector. Hence, make-var relies on cached projection functions, 
one for each pair of depth and offset. 

Make-lam is implemented in terms of its multi-argument cousin make-lam*: 

(define (make-lam* xs e) 

(Icimbda (env) 

(let ((comp (e (cons xs env)))) 

(lambda (rt-env) 

(leimbda ys 

(comp (cons (list->vector ys) rt-env))))))) 

Make-lam* is staged: It performs all computations that only depend on the 
compile-time environment env before it abstracts the run-time environment 
rt-env. The run-time part of make-app is the S combinator and compile-term 
just supplies the initial compile-time environment. 



3.6 Pass free variables 

A refinement of the multi-argument approach of Sec. 3.4 passes only the values 
of the free variables at run time. There is no need to maintain an environment at 
compile time, instead compilation generates the list of free variables along with 
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(define (make-VEir x) 
(list (list x) 

(Icimbda (x) x))) 



(define (make-lam x e) 

(let* ((fv-e (freevars e)) 

(comp-e (compiled e)) 

(fv (set-subtract fv-e x)) 

(comp ((get-lambda* (length fv) 

1 

(nEunes->index (append fv (list x)) fv-e)) 

comp-e) ) ) 

(list fv comp))) 

Fig. 5. Combinators to pass free variables 



the code and the code expects the values of the variables to be passed to it in 
the sequence specified by the list. Hence, the active syntax type is a dependent 
sum ^ env : Var*.(Vall®^^l ^ Val), i.e., a pair containing the list of the free 
variables and the corresponding function. The functions freevars and compiled 
are the projections on the first and second component. 

Due to the invariant that exactly the values of the free variables are passed 
to each expression the compilation of variables is straightforward (see Fig. 5). 

Compilation of lambda abstraction is more involved, due to the fact that the 
abstracted variable may not occur free in the body. It relies on a cached function 
get-lambda* that takes the number of variables to be expected from the con- 
text ((length fv)), the number of variables abstracted (1), and a list of num- 
bers that select those variables that are passed on to the body ( (names->index 
(append fv (list x)) fv-e)). 

For applications, there is similar cached function get-application* that 
selects and distributes the values of the free variables. 



3.7 Shallow binding 

Shallow binding [4] inspires an approach which represents variables by reference 
cells. It is safe to bind variables to fixed mutable cells at compile time, provided 
that each cell contains a stack of values with the current value on top. 

The type of the active syntax is CEnv ^ Unit ^ Val. There is no explicit 
run-time environment. All name resolution is performed at compile time. 

To conserve space, we only consider the compilation of a lambda abstraction 
in Fig. 6. The compile-time environment env is a list of pairs of variable names 
and cells. Compilation creates a new cell cO corresponding to the variable vO. 
Next, it collects the cells that make up the current environment in cells. Finally, 
it compiles the body of the lambda while binding vO to cO. 

The resulting thunk which is executed at run time forms a closure consisting 
of the current values of the cells in the environment and returns the real lambda 
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(define (make-lam vO e) 

(Icimbda (env) 

(let* ((cO (make-cell ’())) 

(cells (map cdr env)) 

(body (e (cons (cons vO cO) env)))) 

(lambda () 

(let ((freevals (map (lambda (c) (car (cell-ref c))) cells))) 
(lambda (xO) 

(cell-set! cO (cons xO (cell-ref cO))) 

(for-each (lambda (c x) (cell-set! c (cons x (cell-ref c)))) 
cells freevals) 

(let ((result (body))) 

(cell-set! cO (cdr (cell-ref cO))) 

(for-each (lambda (c) (cell-set! c (cdr (cell-ref c)))) 
cells) 

result))))))) 

Fig. 6. Compilation of lambda abstraction using references 



(lambda (xO) . . .). Applying the lambda establishes the new binding of the 
variable vO and installs the values of the free variables from the closure. After 
running the body, it restores all bindings to their previous state before returning 
the result. 

The compilation of a variable obtains its associated cell and returns a thunk 
that returns the top of the stack stored in that cell. 

The compilation of an application wraps the application in a thunk that runs 
the thunks of the subexpressions as appropriate. 

On entry to the body of a lambda only the cells corresponding to free variables 
of the body need updating. This optimization is straightforward. 



4 Results 

This section reports practical experiments with implementations of most of the 
approaches described. We have run three sets of benchmark programs on two 
different interpreted Scheme implementations, Scheme48 version 0.51 and Gam- 
bit 3.0. Scheme48 compiles to byte code for subsequent interpretation whereas 
Gambit is a quasi source-level interpreter. All measurements were performed on 
a 233MHz Pentium II machine with 256MB of memory, running FreeBSD 2.2.5. 
We report the average time of ten runs of the same computations, using the 
respective time commands of the Scheme systems. 

We have measured the following variations of code splicing: 

scheme section 3.1, generate Scheme source and compile using eval; 
ski-opt section 3.2 but using an intermediate combinator expression; 
ski- comp is the fully deforested version; 
flat section 3.4, pass the run-time environment as a flat tuple; 
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linked section 3.5, implement the environment by a linked list; 

free section 3.6, flat environment restricted to the free variables; 

ref section 3.7, naive shallow binding; ref-free only updates free variables. 

We have measured times for 

constr construction: generate an object of type ASyntax; 
comp compile with compile-term; 
run run the resulting procedure. 

We have used TDPE to generate lambda expressions as follows: 

church/n constructs the text of the Church numeral for n (see Table 1); 
tiny specialize the Tiny (a little imperative language [18]) interpreter wrt. a 

factorial program (see Table 2); 

mixwell specialize the Mixwell (a first-order functional language [14]) inter- 
preter wrt. a program with about 300 functions (see Table 3). 

In almost all cases, the construction time of the active syntax is equal to the 
construction time of the corresponding source. The exception is free which per- 
forms a free variable analysis at construction time. This amounts to a slowdown 
by a factor of 10 for church and 4 for tiny and mixwell. 

With Scheme48, the time taken for construction and splicing is linear in the 
size of the source term. In Gambit, it seems to be quadratic. Compilation using 
eval seems to take quadratic time for both systems. In the church benchmark, 
Scheme48 ran out of memory for scheme/10000 given the maximum possible 
heap size -h 33539072. 

The holes in the mixwell table come from heap overflow for linked and an 
implementation restriction for scheme: the distributed version limits the nesting 
depth of bindings to 256 while the mixwell program has a nesting depth of well 
over 300. The table demonstrates that code splicing overcomes such restrictions.^ 

Overall, the free approach comes out fastest for all benchmarks. However, 
its compilation time is much slower than all others. The second choice is be- 
tween linked and ref- free which combine extremely fast compilation time with 
fairly good execution times. The church benchmark seems to give a fairly dis- 
torted picture because of its uncharacteristic behavior for ref- free. The tiny 
and mixwell programs appear to yield more realistic results. 

In terms of compilation time of ski-comp vs. ski-opt, deforestation pays 
off: ski-comp is two times faster. In terms of run time, ski-comp is slightly 
slower because ski-opt avoids some uses of the I combinator by inspecting the 
text of the combinator expression; this is impossible for ski-comp which does 
not generate an intermediate result. 

For the tiny and mixwell benchmarks, it is interesting to compare with 
the run time of the original interpreter prior to specialization with TDPE. For 
tiny, its run time is 0.47 seconds for Scheme48 and 0.165 seconds for the Gam- 
bit interpreter. For mixwell, it is 0.712 seconds (Scheme48) and 0.64 seconds 

^ This restriction has been removed in the mean time. 
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Scheme48 


Gambit Interpreter 


technique 


constr 


comp 


run 


constr 


comp 


run 


scheme/ 100 


0.00 


0.28 


0.00 


0.005 


0.003 


0.001 


scheme/1000 


0.03 


23.08 


0.00 


0.085 


0.069 


0.005 


scheme/10000 


0.37 


— 


— 


2.640 


3.510 


2.776 


scheme/100000 


3.75 


— 


— 


206.993 


352.020 


91.521 


ski-opt/1000 


0.03 


0.22 


0.01 


0.073 


0.477 


0.703 


ski-opt/10000 


0.37 


2.26 


0.17 


2.649 


9.535 


17.414 


ski-opt/100000 


3.72 


23.65 


1.71 


191.800 


547.981 


740.677 


ski-comp/ 1000 


0.03 


0.10 


0.01 


0.087 


0.370 


0.684 


ski-comp/ 10000 


0.37 


1.14 


0.19 


2.676 


6.174 


21.829 


ski-comp/ 100000 


3.74 


11.66 


1.95 


204.936 


276.527 


868.653 


flat/1000 


0.03 


0.03 


0.01 


0.087 


0.066 


0.352 


flat/10000 


0.37 


0.32 


0.17 


3.649 


3.020 


8.827 


flat/100000 


3.71 


3.28 


1.76 


206.176 


280.349 


450.799 


linked/ 1000 


0.04 


0.13 


0.02 


0.090 


0.126 


0.888 


linked/ 10000 


0.40 


1.33 


0.25 


3.099 


4.161 


13.969 


linked/ 100000 


4.08 


13.47 


2.43 


201.652 


354.373 


445.056 


free/ 1000 


0.32 


0.00 


0.00 


0.394 


0.000 


0.349 


free/10000 


3.24 


0.00 


0.08 


7.775 


0.000 


6.677 


free/100000 


32.55 


0.00 


0.80 


240.686 


0.000 


125.246 


ref/ 1000 


0.03 


0.03 


0.00 


0.086 


0.061 


0.344 


ref/ 10000 


0.36 


0.39 


0.07 


3.909 


2.841 


9.201 


ref/ 100000 


3.68 


3.96 


0.79 


214.278 


275.319 


385.260 


ref- free/ 1000 


0.03 


0.07 


0.00 


0.161 


0.117 


0.355 


ref-free/ 10000 


0.37 


0.73 


0.08 


4.695 


3.886 


10.433 


ref-free/ 100000 


3.74 


7.50 


0.87 


214.103 


322.110 


604.603 



Table 1. Run times for Church numerals (in sec) 





1 Scheme48 




1 Gambit Interpreter I 


technique 


constr 


comp 


run 


ratio 


constr 


comp| 


run 


ratio 


linked 


0.01 


0.01 


0.52 


1.13 


0.011 


0.008 


0.275 


1.666 


flat 


0.01 


0.00 


0.72 


1.55 


0.011 


0.001 


0.523 


3.169 


free 


0.04 


0.00 


0.43 


0.94 


0.030 


0.000 


0.182 


1.103 


ref- free 


0.01 


0.00 


0.72 


1.53 


0.010 


0.009 


0.534 


3.236 


ref 


0.01 


0.00 


1.03 


2.26 


0.012 


0.002 


0.875 


5.303 


scheme 


0.01 


0.04 


0.40 


0.85 


0.013 


0.003 


0.039 


0.236 



Table 2. Timings for the Tiny interpreter (sec) 
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Scheme48 




Gambit Interpreter 


technique 


constr 


comp 


run 


ratio 


constr 


comp 


run 


ratio 


linked 


3.45 


— 


— 


— 


3.870 


32.329 


1.158 


1.89 


flat 


3.38 


4.36 


2.06 


2.90 


3.841 


0.852 


2.407 


3.70 


free 


13.44 


0.00 


0.14 


0.20 


7.826 


0.000 


0.093 


0.15 


ref 


3.33 


6.42 


3.80 


5.34 


3.743 


1.582 


3.966 


6.26 


ref- free 


3.35 


6.42 


0.32 


0.44 


3.738 


2.858 


0.843 


1.37 


scheme 


3.42 


11.88 


— 


— 


3.723 


2.966 


0.036 


0.06 



Table 3. Timings for the Mixwell interpreter (sec) 



(Gambit interpreter). The ratio column contains the run time of the compiled 
version divided by the run time of the fully interpreted version. 

Both experiments require multi-argument lambda abstraction and applica- 
tion. The corresponding active syntax constructors are not implemented for ski- 
opt and ski- comp, hence there are no results for these techniques. 

The very first runs of linked and free are about an order of magnitude 
slower than the rest because they fill the caches. 



Assessment Obviously, our mileage varies depending on the system that we 
use. For the Scheme48 system, which compiles to byte code, the code splicing 
approach seems to be viable and the free and ref-free implementations give 
encouraging results. 

For the Gambit interpreter, the implementation of eval is blindingly fast 
and gives very good results, so this is the method of choice for the Gambit 
system. Why? Gambit itself uses a code splicing strategy to implement eval [9], 
but since the Gambit interpreter itself is compiled eval splices fully compiled 
code. In contrast, our experiments were conducted with the interpreter, only. 
Therefore, we expect to obtain better results when we use Gambit’s compiler 
because in that case our combinators are fully compiled, too. 

5 Related work 

Writing staged interpreters has become popular since Feeley’s thesis [8,9]. It is 
now a standard technique that is taught in introductory textbooks [I]. The par- 
tial evaluation community also exploits this style and a set of transformations 
to improve the staging properties to specialize programs and generate compilers 
efficiently [13]. Holst and Gomard [12] show that the same style and very similar 
transformations enable a lazy functional programming language to achieve sim- 
ilar specialization effects as partial evaluators. In contrast, we are considering 
staging for strict functional programs. 

One of our sources of inspiration is Augustsson’s ingenious implementation 
of Imli, the interactive part of the lazy ML compiler [3]. Lmli compiles an inter- 
active definition to an (SKI) combinator expression and maps it into executable 
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code by folding the expression with respect to the compiled definitions for the 
combinators (with type checking turned off). This results in respectable speed 
for code typed in from the terminal and at the same time trivial interfacing 
with compiled code from other modules and with the run-time system, exactly 
the goals of our setting. Our design space is somewhat less constrained than 
Augustsson’s: Since we are using an untyped, strict, and impure language, there 
are interesting options to consider besides SKI combinators. 

Two works deal with compiling specialized source to byte code on-the-fly. 
Sperber and Thiemann [19] implement a back end for a traditional partial eval- 
uator. As in the present work, they reinterpret the active syntax, but their im- 
plementation generates byte code on-the-fly. Balat and Danvy [5] construct an 
internal representation of the source term which they submit in toto to the byte 
code compiler. They do not deforest the intermediate result. Like the present 
work they implement a back end for TDPE, thus side-stepping problems with 
top-level definitions and primitive operations that contributed to the complexity 
of the other work [19]. 

6 Conclusion 

We have investigated part of the design space for compilation by code splic- 
ing. We avoid an expensive compilation step by using higher-order functions to 
splice together precompiled pieces of code at run time. Each of the techniques 
that we propose has counterparts in the implementation of functional program- 
ming languages. Similar to the tradeoffs between the different implementation 
techniques, the splicing techniques have tradeoffs that depend on the particular 
source programs and on the underlying implementation. 

Some of the methods have not been considered before in this context, most 
notably the compositional compilation to SKI combinators, director strings, and 
the strategy that employs shallow binding. 
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Abstract. A program should document its organization and decisions 
about the programming process. Since the programmer’s thinking about 
programming and program organization continually evolves, languages 
inevitably prove unable to state these decisions in a precise and adequate 
fashion. Macro systems could provide a convenient way to extend a lan- 
guage with such statements, if they had more structure than traditional 
C- and Lisp-style macros provide. 

With our system, McMicMac, designers can express a variety of specihca- 
tions as language constructs, including program representations of design 
patterns, high-level recursive programming operators, and collaboration- 
based design mechanisms. Unlike traditional macro systems, McMicMac 
offers a simple yet powerful means for describing specifications, prevents 
unintentional name clashes, provides feedback in terms of the program- 
mer’s source, and has modular mechanisms for managing specihcations. 
We have implemented and used McMicMac to define several groups of 
extensions. 



1 Introduction 

A program specifies not only behavior, but also decisions concerning its orga- 
nization and about the programming process. It expresses this information in 
terms of the constructs of the programming language in which it is written. Any 
language, however, has only a limited set of constructs, while programmers con- 
stantly invent new concepts and grow their vocabulary about both the program 
and the programming process. This leads to a gap in expressiveness. 

Many programming methodologies try to narrow this gap. Program design 
patterns [12] that capture some details of the structure of the program are one 
example; designs based on collaborations [3] are another. Yet other examples are 
abstract structure operators [22] and adaptive programming specifications [20]. 
They permit programmers to describe a program’s organization and develop- 
ment, and can also be mapped into constructs in the underlying programming 
language. 
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These specifications are used in various ways. They are typically described 
only in comments, translated into code by hand, or embedded in the program 
as linguistic extensions that are processed by special-purpose tools. Comments 
are unchecked and therefore unreliable. Implementing specifications manually is 
painstaking, deprives program maintainers of crucial information, and makes it 
difficult to ensure that constantly-evolving programs are meeting the intended 
specifications. In contrast to these, special-purpose tools can both verify that 
the program satisfies the specification and translate the specifications into in- 
structions in the underlying programming language. 

Unfortunately, these special-purpose tools take considerable effort to imple- 
ment. Each tool must parse the complete programming language and its own 
extensions, and implement its own rewriting strategy. Also, since these tools 
hard-wire their understanding of the underlying language, they may not be able 
to process a program extended with constructs from some other specification 
process. Finally, extending such tools may be cumbersome, and can differ con- 
siderably between tools. 

We propose an alternate approach to expressing such structural properties. 
Our approach, based loosely on Scheme macros, offers more structure and more 
facilities than traditional C or Lisp-style macro expanders. It simplifies the task 
of specifying properties, and tries to preserve their integrity when communi- 
cating information between programming tools and the programmer. Finally, by 
providing a common framework, it allows different specifications to interact atop 
the same language and with each other. 

The rest of this paper is organized as follows. Section 2 discusses some of the 
structural properties that programmers use, and how they can be implemented. 
Section 3 describes features that would be useful to implement the specifications 
of Section 2, and how our implementation supports them. Section 4 surveys the 
literature of related work. The last section summarizes the ideas in this paper, 
and presents some directions for future work. 

2 Programming with Specifications 

In this section we present some examples of structural properties and discuss 
how to integrate them into the programming process. 



2.1 Software Patterns 

Design patterns sketch a solution to a class of related problems. By customizing 
a pattern to a given context, a programmer can reuse the embodied solution. 
Some patterns represent general “architectures” with respect to a programming 
language that can be instantiated effectively in different contexts. We call these 
architectures software patterns. 

Figure 1 illustrates a sample use of a software pattern^ in Java. The code 
in the figure uses the Adapter pattern, which enables the reuse of an existing 

^ All the patterns mentioned in this paper are taken from Gamma, et al. [12]. 
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interface OutputRoutines { 
void drawLine (); 
void printText (); } 
class Vendor Graphics { 
void drawSolidLine () {}; 
void renderText () {}; } 

// GraphicalOutput adapts Vendor Graphics to OutputRoutines 
class GraphicalOutput implements OutputRoutines { 

Vendor Graphics lowLevelDriver-, 

public void drawLine () { lowLevelDriver. drawSolidLine (); } 
public void printText () { lowLevelDriver. renderText (); } } 



Fig. 1. Adapter Pattern: Java Version 



(interface OutputRoutines 
(methods (void drawLine ()) 

(void printText ()))) 

(class Vendor Graphics 

(methods (void drawSolidLine ()) 

(void renderText ()))) 

(Adapter GraphicalOutput adapts Vendor Graphics 

to OutputRoutines 
as lowLevelDriver 

(fields) 

(methods 

(public void drawLine () {lowLevelDriver . drawSolidLine ())) 
(public void printText () {lowLevelDriver . renderText ())))) 



Fig. 2. Adapter Pattern: Specification-Based Version 



class with a different interface. More specifically, suppose an existing class imple- 
ments the desired functionality but does not implement the desired interface. An 
Adapter acts as a surrogate for this class by implementing the desired interface, 
forwarding requests to instances of the existing class and tailoring responses to 
its interface. 

In our example, OutputRoutines is an interface that represents output gener- 
ators, and is implemented by actual generators on various devices. The vendor- 
provided VendorGraphics performs graphical output, but VendorGraphics does 
not implement OutputRoutines, and its interface is slightly different. In Figure 1, 
the programmer creates an Adapter class, GraphicalOutput, that forwards re- 
quests to an instance of VendorGraphics. (For brevity, we elide the details of 
the actual methods.) The relationship is documented informally through the 
comment. 

The programmer could now make the mistake of defining the class Ghent with 
the type of some variable var to be VendorGraphics but assigned an instance 
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of GraphicalOutput. Unfortunately, VendorGraphics is the class being adapted 
rather than the interface it is being adapted to: 

class Ghent { 

VendorGraphics var — new GraphicalOutput (); 

} 

The type-checker flags this assignment with the following error message: 

test . java: 13 : Incompatible type for new. Can’t convert 
GraphicalOutput to VendorGraphics. 

VendorGraphics var = new GraphicalOutput () ; 



This example illustrates several effects of programming directly in terms of 
a pattern’s constituent constructs: 

— The implementation of the Adapter pattern in terms of plain classes and 
interfaces obscures the pattern’s identity, and thus decreases the clarity of 
the program. This makes it more difficult for readers to understand the 
structure and intent of the code. 

— It increases the potential for errors due to the volume of code. Pattern code 
handles administrative tasks such as maintaining invariants, which a pro- 
grammer must correctly implement during the initial development and re- 
member to update during maintenance. Even in this simple example, for 
instance, the programmer must remember to make GraphicalOutput imple- 
ment the interface OutputRoutines , and declare the type of lowLevelDriver 
as VendorGraphics. 

— It becomes difficult to replace the code for the pattern instance with an 
improved version, e.g., one that is more extensible or offers better protection 
against errors. 

— All feedback is reported in terms of the individual units of code, and the 
programmer must then manually extrapolate from the error message to the 
original pattern-based design. 

In short, programming directly with the constituents of patterns can potentially 
affect programmers at every stage: implementation, debugging and maintenance. 

Figure 2 shows the same program (translated into a parenthesized version 
of Java), except it makes the Adapter explicit by using an Adapter construct, 
which is translated into the equivalent of the code in Figure 1. The use of an 
Adapter makes it clear that there is a mismatch between OutputRoutines and 
VendorGraphics , and highlights how this can be overcome. The equivalent of 
Ghent’s Java declaration is then 

(class Ghent 

(fields {VendorGraphics var = (new GraphicalOutput ())))) 
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This results in the error message 

test . cj : 15 . 33-15 . 56 : Incompatible assignment type. Can’t convert 
GraphicalDutput to VendorGraphics . 

GraphicalDutput created by Adapter at test . cj : 7 . 2-7 . 8 . 

which uses the Adapter declaration to report the error in terms of the program- 
mer’s pattern code, not just the expanded constructs. The error message frees 
the programmer from having to track manually what each construct can intro- 
duce, which can be especially difficult since the expansion of the construct can 
change over time. This error message is produced by a generic type-checker for 
our parenthesized version of Java; Section 3.3 describes how this information is 
generated in greater detail. 

2.2 Composition Invariants 

Smaragdakis and Batory [24] present several approaches to validating the cor- 
rectness of module compositions. They express the compositional requirements 
through the type system. For instance, suppose the implementation of a virtual 
machine could include a module that tags the memory representations of data 
(Tag) and another that implements garbage collection (GC). Furthermore, sup- 
pose that the GC module requires Tag to be present in the module composition. 
To ensure that the Tag module has already been added, GG declares a dummy 
variable of type TagIncludedProperty, which is the name of an empty class de- 
clared only by the Tag module. Thus, if Tag is not included in the composition 
hierarchy before GG, the dummy variable declaration will raise a type error, 
which indicates an erroneous composition. 

Implementing this approach via explicit constructs for validating composi- 
tions offers many advantages: 

1. These class and variable declarations play no role in the execution of the 
program. Therefore, they obscure its behavior, potentially making it more 
difficult to maintain. 

2. Some of these invariants are tricky to implement. It is thus safer to have 
them implemented automatically by a program than manually by the pro- 
grammer. This is especially important since some negative properties require 
a quadratic number (in the size of the program and the number of properties) 
of dummy variable declarations. Also, when a new property is introduced, 
the programmer has to add it to each class by hand. With a properly de- 
signed language extension, the programmer can instead establish a single 
point of control and greatly reduce the work required to specify the pro- 
gram’s properties. 

3. When an invariant is violated, the programmer should get an error in terms 
of the actual invariant, not just in terms of the (often mangled) names used 
to represent it in the source. Smaragdakis and Batory identify this issue as 
one of the major problems in their approach [24, pg. 562]. 

4. While adding the dummy variable declarations, a programmer might acci- 
dentally choose a name that is already in use for a different purpose. 




Expressing Structural Properties as Language Constructs 263 



2.3 Other Specifications 

Several other specification techniques can be handled with our proposed solution. 
For instance, abstract structure operators [22] are used to simplify the process of 
describing traversals over recursively-defined types. They are generated automat- 
ically from the type specification. Instead of defining recursive procedures over 
these types, programmers use and compose these high-level operators, which are 
then translated into regular recursive procedures. The translation again raises 
the issues of name-clash management and feedback-reporting, which can be han- 
dled by our proposal. 

3 Programming Support for Specifications 

In the preceding section, we have mentioned several desirable properties for a 
programming tool that supports specifications. To summarize, such a tool 

1. should make it easy to describe the syntax of the property; 

2. should permit a simple specification of the translation into code fragments 
for those properties that require an implementation; 

3. must help the designer avoid name capture problems; 

4. should provide feedback to programmers in terms of the original specifica- 
tions, not just their expansions; and, 

5. should support the definition of disjoint groups of specifications. 

We have developed a language elaborator, McMicMac, that meets these require- 
ments. In this section, we describe the key properties of McMicMac, and explain 
how they address the aforementioned needs. For information concerning McMic- 
Mac’s implementation, we refer the reader to the technical report [18]. 

3.1 Describing Specifications 

Many specifications have operational meanings. For such specifications, designers 
must be able to specify the transformation that maps an instance of a specifica- 
tion into constructs in the underlying language. The mechanism must be both 
convenient enough to simplify the addition of new transformations, and powerful 
enough to specify potentially complex ones. 

A software pattern, for example, can be thought of as a parametric body of 
code that is specialized at each particular use. To use a pattern, the programmer 
must provide code fragments for the parameters. The pattern implementation 
assembles a program component by splicing the code fragments into the parame- 
ter positions of the body of code. Thus the pattern designer needs a tool that (1) 
permits the definition of parameterized bodies of code, and (2) generates code 
from arguments supplied for the parameters. McMicMac supports these opera- 
tions felicitously through shape matching,^ a mechanism due to Kohlbecker and 
Wand [17]. 

^ This is traditionally known as “pattern matching” , but we adopt different terminol- 
ogy to avoid confusion with software patterns. 
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(define-macro Adapter 
(rewrite 

(Adapter adapterName adapts adapteeType 
to desiredinterface 
as adapteeVariable 

(fields field-decls . . .) 

(methods method-decls . . .)) 

(with-keywords adapts to as fields methods) 

(as 

(class adapterName implements desiredinterface 
(fields {adapteeType adapteeVariable) 
field-decls . . .) 

(methods method-decls ...))))) 



Fig. 3. Adapter Pattern Definition 



Shapes are built from keywords, shape variables, and sequences. The shape- 
matcher works by comparing each phrase in a program against the collection 
of defined shapes. A phrase matches a shape if the phrase uses keywords and 
sequences in the same manner as the shape. In this case, the shape matcher binds 
the corresponding parts of the inputs to the shape variables in the template. It 
then generates an output phrase based on the corresponding macro definition, 
which yields a new phrase. It expands this phrase again, continuing until the 
phrase does not match any defined shape. Figure 3 presents the definition of 
the Adapter pattern, which is used in Figure 2. This use results in code that is 
equivalent to that of Figure 1. 

The matcher treats ellipses (...) specially. An ellipsis follows a (“head”) shape 
in a sequence,^ and matches a source sequence of zero or more instances of the 
head shape. It binds each shape variable in the head shape to a sequence. This 
sequence consists of the sub-terms, in order, of the terms in the source sequence 
that correspond to the shape variable’s position in the head shape. Ellipses can 
be nested to arbitrary depth. Each nesting level introduces a nested sequence in 
the binding of a shape variable. 

The definition and application of the Adapter pattern illustrate the use of 
ellipses. One sub-shape of the Adapter is (methods method-decls ...). In the 
example, there are two methods in the shape: 

(methods 

(public void drawLine () {lowLevelDriver . drawSolidLine ())) 
(public void printText () {lowLevelDriver . renderText ()))) 

The shape-matcher binds body-terms to a sequence of two pieces of code: 

(public void drawLine () {lowLevelDriver . drawSolidLine ())) 

The head of a shape cannot be . . ., else the shape is considered ill- formed. 
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and 

(public void printText () {lowLevelDriver . renderText ())) 

As Figure 3 suggests, the shape-matcher not only verifies that the specifi- 
cation has a proper form, it also generates output. Put differently, adding a 
specification via McMicMac assigns a precise meaning to a specification. The 
shape-generator works by essentially inverting the matching process. In particu- 
lar, it too can handle nested ellipses; each shape variable must appear under as 
many ellipses in the output shape template as it did in the input template. 

Ellipses play an important role not only in laying out the notation, but also 
in defining the meaning of specifications. They elucidate the inductive structure 
of the specifications, an aspect that is often overlooked in pattern definitions. 
Specification implementors use ellipses wherever they want to leave the number 
of concrete entities unspecified and let the programmer provide this information. 
In the Visitor pattern, for instance, the programmer’s specification not only 
generates the visited type but also the visiting methods that characterize the 
pattern. Thus ellipses capture what Lauder and Kent term “purity” [19]: they 
reflect all instances of the pattern, not just a single deployment from which the 
programmer must extrapolate to the general case. 



3.2 Avoiding Inadvertent Interference in Code 

When a specification expands into code that co-mingles with code written by 
the programmer, one code fragment might inadvertently bind or use a name 
that is bound or used in the other fragment. Since neither the programmer nor 
specification designer can foresee all such situations, the elaborator must ensure 
that the programmer’s and the specifieation’s code do not inadvertently interfere 
with the lexical properties of each other. 

Macro expansion can ensure that variable bindings and uses in user code do 
not bind and are not accidentally bound by bindings and uses in generated code, 
and vice versa. This process is called hygienic expansion [16]. In McMicMac, the 
hygienic macro expander works by marking each step of expansion with a dis- 
tinct “timestamp”, and accumulating timestamps on every term. To determine 
whether a variable is bound, the expander considers both the name and times- 
tamps on a variable, and identifies only those variables with the same name and 
timestamps. Since code from the user and from each macro elaboration have 
different timestamps, this prevents inadvertent capture of variables. 

For example, consider Smaragdakis and Batory’s method for verifying mod- 
ular compositions [24] . Their specification mechanism introduces declarations of 
the following form (in C-|— k, using our example from Section 2.2): 

template < class Super > class Tag : public Super { 
protected: 

class TagIncludedProperty {}; 

...} 
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is the Tag module, and 

template < class Super > class GC : public Super { 
private: 

TagIncludedProperty dummy 1 ; 

■■■} 

is GC . The variable dummyl here has two potential problems: 

— It interferes with any other attempt to declare a variable of the same name. 
Such a variable may be declared not only by the user, but also by some 
other specification mechanism. Even if the specification designer chooses an 
obscure name, some specifications can be applied multiple times, so one 
instance of the specification can clash with another one. 

— Some other part of the class’s body may accidentally mention the name 
dummyl . This should produce an unbound variable error, but because of 
the above declaration, it may produce an unexpected type error or, worse, 
run without error and compute an incorrect answer. 

Hygienic expansion ensures that the identifier dummyl that is introduced by 
the macro does not conflict with any other identifiers, including those named 
dummyl but coming from other sources (including other uses of the same macro) . 
This is essential when a program is developed by a team of programmers, who 
would otherwise find it extremely difficult to prevent inadvertent name collisions. 



3.3 Maintaining the Integrity of Specifications in Feedback 

A programming environment includes numerous tools that provide feedback 
about the program. These include type-checkers, program sheers, debuggers, 
profilers, and so forth. In all these cases, the analyzed program includes both 
code written by the programmer and code generated by a specification. Even 
when these are co-mingled, feedback must be in terms of the source written by 
the programmer, not just in terms of the generated code. This is essential for two 
important reasons: 

— to maintain the impression that the programmer is coding at the level of the 
specification, not at the level of the code it generates; and, 

— because some of the code in question may not appear in the source text. 

Feedback of this form is particularly important when the juxtaposition of user 
and specification code results in complex errors. 

McMicMac provides an interface for tools that wish to report feedback. The 
interface consists of two kinds of information, and protocols that process this 
information. The two kinds of information are: 

source-correlation For each term, McMicMac notes the source location of the 
phrase that generated it. This information is maintained through the macro 
expansion process. Thus when a tool needs to provide feedback about a term. 
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it can report it in terms of the source phrase that the programmer needs to 
examine. This is especially helpful to graphical interfaces, since they can 
highlight the appropriate source text. 

elaboration-tracking McMicMac also maintains a history of transformations 
applied to each elaborated term. In the example of Section 2.1, it records that 
GraphicalOutput is generated by Adapter (with its source location). In more 
complex examples, one specification might expand into the use of another. In 
these cases, conventional error messages in terms of the constituents might 
be especially unhelpful. 

The tools themselves do not need to process this information. For example, the 
McMicMac error checker processes only the Classic Java language, and has 
no knowledge of software patterns. Instead, the information is extracted and 
processed by a McMicMac protocol, which generates feedback of the sort shown 
in Section 2.1. 



3.4 Organizing Specifications by Layers 

Specification designers need modular constructs to organize collections of spec- 
ifications. McMicMac provides a module mechanism called vocabularies. Each 
specification must be placed in some vocabulary; a vocabulary represents a lan- 
guage fragment, derived by combining the individual macros in that vocabulary. 
Vocabularies can be combined with other vocabularies. In particular, a vocabu- 
lary representing a specification can be combined with different base languages. 
These base languages might vary in at least two different ways: 

1. By layering several vocabularies atop a base language, the programmer can 
combine multiple specification schemes. This may be impossible to achieve 
when each specification method provides its own tool, since these tools may 
reject or incorrectly process the other forms of specification. 

2. Users can customize how they use the specification method by picking dif- 
ferent base languages. For instance, a sophisticated programmer will want a 
powerful base language, while teachers might prefer a simpler one for their 
students. 

A vocabulary also represents one way in which a specification analyzes a 
program; some specifications may process the same program fragment several 
times. For instance, many common program analyses and optimizations can be 
partitioned into “collection” and “modification” phases. The first phase traverses 
the program and collects information relevant to the analysis. This information 
is processed, and then used by the second phase, which rewrites the program 
appropriately. Each of these phases can be elegantly encoded as a vocabulary. 

3.5 Implementation Status 

McMicMac is currently implemented in MzScheme [9]. We have constructed 
vocabularies for several versions of Scheme [8] and one for CLASSIC Java [10], an 
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idealized, parenthesized subset of Java. We have enhanced these base languages 
with several structural transformers, including software patterns corresponding 
to each of the design patterns described in the catalog of Gamma, et al. [12], 
abstract structure operators [22], and execution profilers [1]. 

The facilities of McMicMac helped us implement all these tools in short 
order. Our implementation produces information that helps us generate useful 
and intuitive feedback for the programmer, such as the error messages shown 
in this paper. Though the current implementation of McMicMac is restricted to 
parenthesized syntax, it can be extended to more traditional syntaxes along the 
lines of Cardelli, et al.’s work [6]. 

4 Related Work 

Researchers have studied various specification mechanisms. We partition these 
into several categories below, though naturally there is some overlap. 

4.1 Software Patterns 

With the increasing popularity of software patterns, many researchers have pro- 
posed ways to design tools and languages that support them. 

Graphical and Translation Tools. Patterns are frequently described using graphi- 
cal notations. Researchers have thus tried to design graphical tools that support 
programming with patterns. Budinsky, et al. [5] describe a GUI-based pack- 
age which lets users prepare a diagrammatic representation of the pattern-level 
layout of their program, and then fill the components with source code. The 
system’s back-end uses a script to generate code from the user’s input. Florijn, 
et al. [11] have built browsers to support their work on fragments, described 
below. Meijler, et al. [21] have built a modeling environment called PAGE in 
which programmers use patterns by cutting-and-pasting from a library of pat- 
terns in UML, and then filling in the blanks with code. The PAGE environment 
is unusual in that it supports patterns throughout the program’s life, not just 
during its initial creation. 

Graphical tools are attractive because they can provide an intuitive interface 
to the programmer, but they have several shortcomings. 

1. They often do not clarify the distinction between a pattern, which can have 
a potentially unbounded number of components, and a pattern instance, 
which has a fixed, finite number. Often, they provide a template with a small 
number of placeholders for concrete entities, which the user must duplicate 
by hand to obtain the desired number; this process is both laborious and 
error-prone. 

2. Some contain an analysis engine for the pattern language, but none of them 
interact with the underlying programming language to provide an equally 
convenient interface for feedback once the patterns have been translated into 
code. This could be remedied by using a system such as ours as a back-end 
for the graphical tools. 
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3. The systems descriptions do not clarify how groups of patterns can be orga- 
nized, or how to extend tools to accommodate other specification notations. 

Soukup [25] proposes implementing patterns through C-style macros. These 
macros lack inductive constructs, offer no facilities for reporting feedback, are 
not hygienic, and have no modular structure. Hedin [13] defines patterns in terms 
of attribute grammars. His work currently uses these to reconstruct patterns in 
existing source code. 



Constraint Models. Both Bosch [4] and Florijn, et al. [11] describe interest- 
ing approaches to pattern-based programming. Bosch takes a top-down view on 
programming with patterns. His system uses a layered object model to describe 
constraints on entities that participate in patterns. His paper also lays out four 
significant problems in implementing design patterns, of which our work ad- 
dresses three: traceability, reusability and implementation overhead. 

Programmers who use Florijn, et al.’s system write “fragments”, which are 
elements of programs and patterns. They then designate fragments to play roles 
in instances of patterns. The pattern designer specifies constraints (similar to 
contracts [14]) that judge when a collection of fragments satisfies the pattern. 
This approach gives the programmer the flexibility of restructuring the system 
easily. It is, however, unclear how their system handles feedback across elabora- 
tion or the inductive structure of patterns. 



4.2 Other Specification Implementations 

Adaptive programs [20] consist of declarations of class hierarchies, traversals, 
and visitors. The traversal maps the order in which to visit nodes in the hierar- 
chy, while the visitors determine what actions to perform at each node. Adaptive 
programs are compiled into programs in a conventional language like Java us- 
ing the Demeter tool [20]. Though it uses sophisticated algorithms to compile 
specifications into efficient code. Demeter does not report feedback across the 
elaboration process. 

Aspect-Oriented Programming [15] is implemented by weavers, which com- 
bine the original program with code derived from the aspect specifications. The 
implementation for Java, called AspectJ, produces generic Java code that can be 
compiled by any generic compiler. As a result, however, it does not offer the user 
support at the level of aspect specifications. Instead, the primer [27] instructs 
users to track errors based on their file location. 

Batory and Geraci [2] describe a process for checking that code generated 
from specifications meets compositional requirements. Their work imposes a type 
system on a program’s components, and uses this system to verify the validity of 
compositions. When a composition is invalid, the system can suggest reasons for 
the error. Their work can be used to improve the error reporting for modular, 
collaborative designs [24] . 
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4.3 Macro Systems and Extensible Grammars 

McMicMac is a close relative (and descendent) of other language extension mech- 
anisms, including macro systems and extensible grammars. The derivation and 
use of McMicMac’s shape-matching mechanism was described by Kohlbecker 
and Wand [17]. The design of hygienic macro expanders is due to Kohlbecker, 
et al. [16], though our algorithm is derived from that of Dybvig, et al. [7]. The 
latter work also outlines a method for maintaining source-correlation. 

Like a macro system, Taha and Sheard’s MetaML [26] gives the user the 
ability to manipulate fragments of a program’s source as values. Their system 
performs type-checking to ensure that program compositions do not produce 
type errors, and verifies that phases of elaboration do not interfere with each 
other in ill-defined ways. Abstract structure operators [22], described in Sec- 
tion 2.3, were implemented using Compile-time Reflective ML (CRML) [23], a 
predecessor of MetaML. 

Cardelli, et al. [6] describe a system that is similar to modern macro systems 
but admits more flexible syntaxes. Their approach is based on extensible gram- 
mars and does not incorporate a direct equivalent of our ellipses. They impose a 
type discipline on terms, which they use to ensure that elaboration terminates. 
Though their system allows users to extend the grammar of a language, the ex- 
tensions are not defined in modular blocks (such as McMicMac’s vocabularies) 
that can be combined with different base languages. 

5 Conclusion and Future Work 

We have shown how modern macros can be used to extend programming lan- 
guages with constructs for specification mechanisms. Modern macro systems 
offer many features that make such extensions both convenient and powerful. 
First, macros support a simple and powerful shape-matching notation to define 
specifications. Second, macros automatically prevent insidious arrangements of 
code from affecting the lexical properties of both specification and user code. 
Third, macros can be grouped into modular constructs that make the macros 
more customizable and reusable. Most importantly, macros can track source 
locations and elaboration history and thus present users with helpful feedback, 
e.g., types, data flow and other messages, in terms of their source. This last issue, 
which is crucial for software development, is frequently ignored in the literature 
on software tools. 

We anticipate many future directions for this work. First, our framework 
makes it easy to add program-processing tools. We can build tools that analyze, 
verify and document programs by selectively processing specifications that they 
understand, and falling back on the standard expansions for unfamiliar ones. Sec- 
ond, we expect the framework can accommodate other paradigms like adaptive 
programming [20] . One vocabulary could collect the adaptive specifications and 
program structure, while a second one would rewrite the program, removing the 
specifications and inserting code to perform traversals. Finally, by implement- 




Expressing Structural Properties as Language Constructs 271 



ing numerous platforms in this framework, we can study the properties that are 
common to structural specifications. 

Acknowledgments We thank the anonymous referees for their detailed re- 
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Abstract. A generic compact printer and a corresponding parser are 
constructed. These programs transform values of any regular datatype 
to and from a bit stream. The algorithms are constructed along with a 
proof that printing followed by parsing is the identity. Since the binary 
representation is very compact, the printer can be used for compressing 
data - possibly supplemented with some standard algorithm for com- 
pressing bit streams. The compact printer and the parser are described 
in the polytypic Haskell extension PolyP. 



1 Introduction 

Many programs convert data from one format to another; examples are parsers, 
pretty printers, data compressors, encryptors, functions that communicate with 
a database, etc. Some of these programs, such as parsers and pretty printers, 
critically depend on the structure of the input data. Other programs, such as 
most data compressors and encryptors, more or less ignore the structure of the 
data. We claim that using the structure of the input data in a program for a 
data conversion problem almost always gives a more efficient program with better 
results. For example, a data compressor that uses the structure of the input data 
runs faster and compresses better than a conventional data compressor. This 
paper constructs (part of) a data compression program that uses the structure 
of the input data. 

A lot of files that are distributed around the world, either over the inter- 
net or on CD-rom, possess structure — examples are databases, html files, and 
JavaScript programs — and it pays to compress these structured files to obtain 
faster transmission or fewer CD’s. Structure-specific compression methods give 
much better compression results than conventional compression methods such as 
the Unix compress utility [3,17]. For example, Unix compress typically requires 
four bits per byte of Pascal program code, whereas Cameron [6] reports compres- 
sion results of one bit per byte of Pascal program code. Algorithmic Research 
B.V. [5] sells compressors for structured data, and reports impressive results. 
Structured compression is also used in heap compression and binary I/O [16]. 

The basic idea of the structure-specific compression methods is simple: parse 
the input file into a structured value (an abstract syntax tree), and construct a 
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compact representation of the abstract syntax tree. For example, consider the 
datatype of binary trees 

data Tree a = Leaf a \ Bin ( Tree a) ( Tree a) 

The following (rather artificial) example binary tree 
tree :: Tree () 

tree = Bin {Bin {Leaf ()) {Bin {Leaf ()) {Leaf ()))) {Leaf ()) 

can be pretty printed to an (admittedly rather wasteful) text description of tree 
requiring 55 bytes. But since the datatype Tree a has two constructors, each 
constructor can be represented by a single bit. Furthermore, the datatype () has 
only one constructor so the single element can be represented by 0 bits. Thus 
we get the following representations: 

Bin (Bin (Leaf ()) (Bin (Leaf ()) (Leaf ()))) (Leaf ()) 
110 10 0 0 

The compact representation consists of 7 bits, so only 1 byte is needed to store 
this tree. Of course, we are not always this lucky, but the average case is still 
very compact. 

This idea has been around since the beginning of the 1980s, but as far as we 
are aware, there does not exist a general description of the program, only exam- 
ple instantiations appear in the literature. One of the goals of this paper is to 
describe the compact printing part, together with its inverse, of the compression 
program generically. It defines a polytypic program (a program that works for 
large classes of datatypes) for compact printing. Together with a parser genera- 
tor this program is a generic description of the structured compression program. 
The implementation (as PolyP code) can be obtained from 

http : //www . cs . Chalmers . se/“patrikj /poly/ 

The compression achieved by our compact printing algorithm, is through a com- 
pact representation of the structure of the data using only static information 
— the type of the data. Traditional (bit stream) compressors using dynamic 
(statistical) properties of the data are largely orthogonal to our approach and 
thus the best results are obtained by composing the compact printer with a bit 
stream compressor. 

The fundamental property of the compact printing function print is that it 
has a left inverse^: the parsing function parse. This is a very common specifi- 
cation pattern: all of the example data conversion problems above are specified 
as pairs of inverse functions with some additional properties. Another example 
can be found in Haskell’s prelude, which contains functions show and its inverse 
read of type: 

show :: Show a a ^ String 
read :: Read a ^ String a 

^ That is, parse o print = id, but print o parse need not be id. In the rest of the paper 
we will write just inverse, when we really mean left inverse. 
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Unfortunately, it is very hard to see from their definitions why read is the inverse 
of show. In this paper, the driving force behind the construction of the functions 
print and parse is inverse function construction. Thus correctness of print and 
parse is guaranteed by construction. Interestingly, when we forced ourselves to 
only construct pairs of inverse functions, we managed to reduce the size and 
complexity of the resulting program considerably compared with our previous 
attempts. 

A second desired property of the compact printing function is that given an 
element x, the length of print x is less than the length of prettyprint x, where 
prettyprint is a function that prints a value in a standard fashion, like the show 
function of Haskell. This is in general difhcult or impossible to prove, and beyond 
the scope of this paper. More information can be found in the literature [15]. 

Summarising, this paper has the following goals: 

— construct a polytypic compact printing program together with its inverse; 

— show how to construct and calculate with polytypic functions; 

— take a first step towards a theory of polytypic data conversion. 

This paper is organised as follows. Section 2 briefly introduces polytypic pro- 
gramming. Section 3 defines some basic types and classes, and introduces the 
compact printing program. Section 4 sketches the construction and correctness 
proof of the compact printing program. Section 5 concludes. Appendix A de- 
scribes the laws we need in the proofs. 

2 Polytypic programming 

The compact printing and parsing functions are polytypic functions. This section 
briefly introduces polytypic functions in the context of the Haskell extension 
PolyP [9], and defines some basic polytypic concepts used in the paper. We 
assume that the reader is familiar with the initial algebra approach to datatypes, 
and not completely unfamiliar with polytypic programming. For an introduction 
to polytypic programming, see [1,10]. 

A polytypic function is a function parametrised on type constructors. Poly- 
typic functions are defined either by induction on the structure of user-defined 
datatypes, or defined in terms of other polytypic (and non-polytypic) functions. 
In the definition of a function that works for an arbitrary (as yet unknown) 
datatype we cannot use the constructors to build values, nor to pattern match 
against values. Instead, we use two built-in functions, inn and out, to construct 
and destruct a value of an arbitrary datatype from and to its top level compo- 
nents. With a recursive datatype d a as a fixed point of a pattern functor a, 
inn and out are the fold and unfold isomorphisms showing d a ^ d>da{d a). 

inn :: Regular d ^ {FunctorOf d) a{d a) ^ d a 
out :: Regular d ^ d a ^ {FunctorOf d) a {d a) 

The pattern functor is used to capture the (top level) structure of a datatype, for 
example, a list is either empty or contains one element and a recursive occurrence 
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of a list. Hence: FunctorOf List = Empty + {Par * Rec). Similarly, the pattern 
functor of the datatype Tree a is Par + {Rec * Rec). As a last example, the 
datatype Rose a of rose trees over a: 

data Rose a = Node a {List {Rose a)) 

has the pattern functor FunctorOf Rose = Par * {List @ Rec), where @ denotes 
functor composition. In general, PolyP’s pattern functors are generated by the 
following grammar: 

f,g,h ::= g+h \ g*h \ Empty \ Par \ Rec | d @ (/ | Const t 

where d generates regular datatype constructors, and t generates types. The 
pattern functor Const t denotes a constant functor with value t. The type context 
Bifunctor f ^ is used to indicate that / is a pattern functor. 

Using the polytypic construct a polytypic function can be defined by induc- 
tion over the structure of pattern functors. As an example we take the function 
psum defined in figure f. (The subscripts indicating the type are included for 



I 1 

psum :: Regular d => d Int ^ Int 



psum = fsum o 


fmap id psum o out 


polytypic fsumt :: Bifunctor f => f Int Int ^ Int 


= case / of 




g + h 


— > either fsum g fsum 


g*h 


— > A(®, y) fsumg x -|- y 


Empty 


A()^0 


Par 


— > \n —> n 


Rec 


— > As ^ s 


d @ g 


— > psum {pmap j^f sum g) 


Const t 


— > Ax ^ 0 



pmap :: Regular d=> {a^b)^da—^db 

fmap :: Bifunetor f => {a^c)-^{b^d)^fab^fcd 

I I 



Fig. 1. The definition of psum 



readability and are not part of the definition.) Function psum sums the inte- 
gers in a datatype with integers. We use the naming convention that recursive 
polytypic functions start with a ‘p’ (as in polytypic) but non-recursive polytypic 
definitions start with an ‘/’ (as in functor). The ‘polytypic map’, pmap, takes a 
function f :: a ^ b and a value x :: da, and applies / to all a values in x, giving 
a value of type d b. The ‘functor map’, fmap, takes two functions g :: a ^ c and 
h :: b ^ d and a value x :: f a b, and applies g to all a values in x, and h to all 
b values in x, giving a value of type fed. The definitions of pmap and fmap can 
be found in the distribution of PolyP. 
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Note that function psum is only defined for Regular datatypes d a. A data- 
type d a is regular (satisfies Regular d) if it contains no function spaces, and 
if the argument of the type constructor d is the same on the left- and right- 
hand side of its definition. In the rest of the paper we always assume that d a 
is a regular datatype and that / is a pattern functor but we omit the contexts 
{Regular d ^ or Bifunctor f ^ ) from the types for brevity. 

3 Basic types and classes 

Compact printing. A natural choice for the type of a compact printing func- 
tion for type ais a ^ Text, where Text is the type of printed values, for example 
String or [Bit]. Since we want to define print as a recursive function, this would 
lead to quadratic behaviour when repeatedly concatenating intermediate results. 
The standard solution for printing functions is to add an accumulating param- 
eter (to which the output is prepended) thus changing the type to a ^ Text — > 
Text, or equivalently, to (a, Text) Text. 

Parsing. Parsing is the inverse of printing, and hence a first approximation of 
its type is Text a. Since we want to apply parsers one after the other, we 
need both a parsed result and the remaining part of the input string, which can 
be passed to the next parser. The standard solution for parsing functions is to 
change the type to Text ^ {a. Text). 

Side effects as functions. We can make the types for printing and pars- 
ing more symmetric by pairing the single Text component with a unit type to 
get the isomorphic type (a. Text) — > {{),Text) for printing and ((), Text) — > 
(a. Text) for parsing. Both these types are instances of the more general type 
TextStateArr a b: 

newtype TextStateArr ab — TS {{a. Text) ^ {b. Text)) 

An element of type TextStateArr a b models a function that takes a value of 
type a and returns a value of type b, and possibly has a side effect on the state 
Text. Thus a compact printer (for a- values) has type TextStateArr a if) , and a 
corresponding parser has type TextStateArr () a. 

The first steps. Our goal is to construct two functions and a proof: 

— A function pc (‘polytypic compacting’) that takes a compact printing pro- 
gram on the element level a to a compact printing program on the datatype 
level d a: 



pc :: TextStateArr a{) ^ TextStateArr {d a) {) 

For example, the function that compresses the tree in the introductory sec- 
tion is obtained by instantiating the polytypic function pc to Tree and ap- 
plying the instance to a (trivial) compact printing program for the type (). 
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— A function pu (‘polytypic uncompacting’) that takes a parsing program on 
the element level a to a parsing program on the datatype level d a: 

pu :: TextStateArr () a ^ TextStateArr {) {d a) 

For the Tree example the element level parsing program is a function that 
parses nothing, and returns (), the value of type (). 

— A proof that if c and u are inverses on the element level a, pc c and pu u are 
inverses on the datatype level d a. 

In the following section, instead of using the type TextStateArr ah in the defi- 
nitions of pc and pu, we will use the more abstract type a h, where ('^) is an 
arrow type constructor. 

The class Arrow. The type TextStateArr a b encapsulates functions from a to 
b that manipulate a state of type Text. Since a parser could easily use a more 
complicated type, for example to store statically available information [14], and 
also the printer could use a more complicated type, we will go one step further 
in the abstraction by introducing the constructor class Arrow [8] : 

class Arrow ('^) where 
arr :: (a ^ b) (a b) 

(:^) : : (a 6) — > {b'^ c) (u'^ c) 

(^) :: (a c) — > (6 d) — > {Either ah'^ Either c d) 

first :: (a b) ^ ((a, c) (b, c)) 

The method arr of the class Arrow embeds functions as arrows and arr id to- 
gether with (:^) form the signature of the category with types as objects, and 
elements of a & as arrows from a to 6. This category has a binary (sum) func- 
tor (the method (^)) and a “half-product” functor (first). Below we write 
as a shorthand notation for arr f . In the appendix we formalise the properties 
we need from Arrows to construct the definitions of functions pc and pu along 
with the proof of their correctness. 

As an example of programming with arrows, we define second — the other 
half-product — in terms of first : 

second :: (a-^ b) ^ ((c, a) (c, b)) 
second f = swa$ first f swap 

swap :: (a, b) ^ (b, a) 
swap (a, b) = (b, a) 

Using first and second we can define two candidates for being product functors, 
but when the arrows have side-effects, neither of these are functors as they fail 
to preserve composition. 

(*i) :: (q'^ c) ^ (b'^ d) ^ {{a, b) (c, d)) 
f mi g = first f ^ second g 
f m 2 g = second f ^ first g 
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Type constructor TextStateArr can be made an instance of Arrow as follows: 

mapFst :: {a h) ^ (a, c) ^ (&, c) 
mapFstf {a, c) = (/ a, c) 

instance Arrow TextStateArr where 
arrf — TS (mapFstf) 

TSf :tm>TSg^ TS{gof) 

TS f ^ TS g = TS {X{x, t) — > either {Xa mapFst Left {f (a,t))) 

{Xb mapFst Right {g (&, t))) 
x'j 

first (TSf) = TS{X{{a,c),t)^ let {b,t') = f (a,t) 

in {{b,c),t')) 



Printing constructors. To construct the printer and the parser we need a little 
more structure than provided by the Arrow class - we need a way of handling 
constructors. Since a constructor can be coded by a single natural number, we 
can use a class ArrowNat to characterise arrows that have operations for printing 
and parsing constructor numbers: 

class ^rrow ('^) ArrowNat {'^)'w\ieve 
printCon :: Nat'^ () 
parseCon Nat 

— Requirement: printCon ^ parseCon = JeL 

With Text = [Nat], the instances for TextStateArr are straightforward, and the 
printing algorithm constructed in the following section will in its simplest form 
just output a list of numbers given an argument tree of any type. A better solu- 
tion is to code these numbers as bits and here we have some choices on how to 
proceed. We could decide on a fixed maximal size for numbers and store them 
using their binary representation but, as most datatypes have few constructors, 
this would waste space. We will instead statically determine the number of con- 
structors in the datatype and code every single number in only as many bits 
as needed. For an n-constructor datatype we use just [ log 2 n] bits to code a 
constructor. An interesting effect of this coding is that the constructor of any 
single constructor datatype will be coded using 0 bits! We obtain better results 
if we use Huffman coding with equal probabilities for the constructors, resulting 
in a variable number of bits per constructor. Even better results are obtained 
if we analyse the datatype, and give different probabilities to the different con- 
structors. However, our goal is not to squeeze the last bit out of our data, but 
rather to show how to construct the polytypic program. Since the number of 
bits used per constructor depends on the type of the value that is compressed, 
printCon and parseCon need in general be polytypic functions. Their definitions 
are omitted, but can be found in the code on the web page for this paper. 

In the sequel will always stand for an arrow type constructor in the class 
ArrowNat but, as with Regular, we often omit the type context for brevity. 
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4 The construction of the program 

We want to construct a function pc that takes a compact printing program on 
the element level a to a compact printing program on the datatype level d a, 
together with a parsing function pu, which takes a compact parsing program on 
the element level a to a compact parsing program on the datatype level d a, and 
a proof that pu is the inverse of pc: 

pc :: 0) ^{da'^ ()) 

pu :: (0-^ a) ^ (() -^ da) 

c u = id pc c puu = pid (c ^ u) = id (1) 

In the proofs below we will assume that the arrows c and u satisfy c u = id. 



Overview of the construction. The construction can be interpreted either 
as fusing the printer pc c with the parser pu u to get an identity arrow id or, 
equivalently, as splitting the identity arrow into a composition of a printer and 
a parser. As both the printer and the parser are polytypic functions, and both 
lift an argument level arrow to a datatype level arrow, we start by presenting a 
polytypic “identity function” pid that lifts an element level identity arrow to a 
datatype level identity arrow. Function pid is constructed below together with 
pc and pu and the proof of equation 1 but the resulting definition is presented 
already here, in figure 2, to aid the reading. The proof that pid id = id is simple 



pid :: (a b) ^ (d a d b) 
pid i = out fid i (pid i) inn 

(b d) ^ (f a b f c d) 

fid ij ^ fid ij 
(fidij) *1 (fidij) 

Jet 
i 
j 

pid (fidij) 



Fig. 2. The definition of pid and fid. 



polytypic fid :: (u'^ c) —* 
— Xij case /of 

g + h — > 
g*h — > 
Empty — > 
Par — > 
Rec — > 

d @ g — > 



and omitted. As we are defining polytypic functions the construction follows the 
structure of regular datatypes: A regular datatype is a fix-point of a pattern 
functor, the pattern functor is a sum of products, and the products can involve 
type parameters, other types, etc. 

The arrow pc c prints a compact representation of a value of type d a. It 
does this by recursing over the value, printing each constructor by computing 
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its constructor number, and each element by using the argument printer c. The 
constructor number is computed by means of function fcSum, which also takes 
care of passing on the recursion to the children. An arrow printCon prints the 
constructor number with the correct number of bits. Finally, function fcProd 
makes sure the information is correctly threaded through the children. 

Top level recursion. We want function pc to be ‘on-line’ or lazy: it should 
output compactly printed data immediately, and given part of the compactly 
printed data, pu should reconstruct part of the input value. Thus functions pc 
and pu can also be used to compactly print infinite streams, for example. We 
have not been able to define function pc with a standard recursion operator such 
as the catamorphism: threading the side effects in the right order turned out to 
be a problem. Instead of a recursion operator we use explicit recursion on the 
top level, guided by fc and fu. 

As pc decomposes its input value, and compactly prints the constructor and 
the children by means of a function fc (defined below), pu must do the opposite: 
first parse the components using fu and then construct the top level value: 

pcc = fee {pc c) ^ oui 
puu = fuu {pu u) inn 
d&f 

Here / ^ 9 = 9 ^ / is used to reveal the symmetry of the definitions. Thus 
we need two new functions, fc and fu, and we can already guess that we will 
need a corresponding fusion law: 

fc::{a^{))^{b^{))^{fab^{)) 

fu-,{{)^a)^{{)^b)^{{)^fab) 

feed fuuu! — fid (c u) {d u') (2) 

We will use the following variant of fixed-point fusion [12,13]:^ 

pif pLg = p,h f d gu' = h{d :»> u') (3) 

Given (2) we can now prove (1). 

pc c puu = pid {c u) 

Definitions of pc, pu, fixed-point theorem (3) 
out fc c d fuu u' irm = h{d u') 

= Equation (2). 

out fid {c:^ u) {d u') Idvh = h{d u') 

= • Define hj = ~out fid {c u) j ^ irm 

True 

The resulting definition of function pid can be found in figure 2. 

Strictly speaking the variables d and u' on the right hand side of the implication 
should be V-quantified over {/* T | i G N} and {g* T | i € N} respectively. 
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Printing constructors. We want to construct functions fc and fu such that (2) 
holds. Furthermore, these functions should do the actual compact printing and 
parsing of the constructors using printCon :: Nat () and parseCon Nat 

from the ArrowNat class: 

fccd = printCon ^ fcSum c c' 
fuuu' = parseCon fuSum u u' 

The arrow fcSum c c' prints a value (using the argument printers c and d for the 
parameters and the recursive structures, respectively) and returns the number 
of the top level constructor, by determining the position of the constructor in 
the pattern functor (a sum of products). The arrow printCon prepends the con- 
structor number to the output. As printCon parseCon = l2 by assumption, 
the requirement that function fu can be fused with fc is now passed on to fuSum 
and fcSum : 



fcSum :: {a ()) ^ {b -^ {)) ^ {f a b Nat) 

fuSum :: (() a) ^ (() b) {Nat f ab) 

fcSum c d ^ fuSum uu' = fid (c u) {d ^ u') (4) 

The arrow parseCon reads the constructor number and passes it on to the arrow 
fuSum u u' which selects the desired constructor and uses its argument parsers 
u and u' to fill in the parameter and recursive component slots in the functor 
value. 



Calculating constructor numbers. The pattern functor of a Haskell data- 
type with n constructors is an n-ary sum (of products) on the outermost level. 
This sum is in PolyP represented by a nested binary sum, which associates to 
the right. Consequently, we define fcSum by induction over the nested sum part 
of the pattern functor and defer the handling of the product part to fcProd : 

polytypic fcSum :: (o'^ ()) ^ {b '^ {)) ^ {f a b Nat) 

= Xc d ^ case/ of 

g + h — > {fcProd c d <0> fcSum cd) inn Nat 
g — > fcProd c c' A() 0 

polytypic fuSum :: (() ~^ a) — > (() b) ^ {Nat f a b) 

= Xu u' case / of 

g + h — > {fuProd u u' ^ fuSum u u') ^ out Nat 
g — > fuProd uu' XO ^ {) 

The types for fcProd and fuProd and the corresponding fusion law are unsur- 
prising: 



fcProd :: {a ()) {b -^ {)) ^ {f a b ()) 

fuProd :: (() a) ^ {{) b) ^ {{) '^ f a b) 
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fcProd cc' fuProd uu' = fid {c u) (c' u') (5) 

We prove equation (4) by induction over the nested sum structure of the functor. 
The induction hypothesis is that (4) holds for the fcSurUf^. 

The sum case: g + h 

fcSunigj^j^ c c' fuSunigj^j^ u u' 

= Definitions 

{fcProdg c c' fcSum/^ c c') inn Nat 
out Nat {fuProdgUu' fuSurUf^uu') 

= out Nat ° inn Nat = id 

{fcProdg c c' fcSumii c c') {fuProdg uu' ^ fuSurrii^ u u') 

= (^) is a bifunctor 

{fcProdg c c' fuProdg uu') ^ {fcSunif^ cc' fuSunif^ u u') 

= Equation (5) and the induction hypothesis 
fidg (c :» u) (c' :» u') ^ fid,^ (c u) (c' u') 

= • Define fidgj^j^ 

fidg+h (c ^ u) (c' m') 

The base case: g 

fcProdg c c' A() ^ 0 AO — > 0 fuProdg u u' 

= A() ^ d AO ^ (j = id :: () -^ () 

fcProdg c c' fuProdg u u' 

= Equation (5) 

fidg (c u) (c' u') 

Sequencing the parameters. The last part of the construction of the program 
is the two functions fcProd and fuProd defined in figure 3. The earlier functions 
have calculated and printed the constructors, so what is left is “arrow plumbing” . 
The arrow fcProd c c' traverses the top level structure of the data and inserts 
the correct compact printers: c at argument positions and c' at substructure 
positions. The structure oi fuProd is very similar but as it is the inverse of fcProd, 
all arrows are composed in the opposite order. The inverse proof is a relatively 
straightforward induction over the pattern functor structure, but omitted here 
due to space constraints. 
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polytypic fcProd :: (a-^ ()) ^ (6 -^ ()) ^ (/ a 6 ()) 

= Ac c' — ^ case / of 

g * h — > {fcProd c c') *2 {fcProd c c') ^ A((), ()) — ^ () 
Empty — > l3, 

Par — > c 
Rec — > c' 

d @ g — > pc {fcProd c c') 

polytypic fuProd :: (() a) ^ (() -^ &) ^ (() -^ / a 6) 

= Xu u' ^ case / of 

g * h — > {fuProd u u') «i {fuProd u u') A() ^ ((), ()) 

Empty — > Ji 
Par — > u 
Rec — > u' 

d @ g — > pu {fuProd u u') 



Fig. 3. The definition of fcProd and fuProd. 



5 Conclusions 

Results 

— We have constructed a polytypic program for compact printing and parsing 
of structured data. As far as we are aware, this is the first generic description 
of a program for compact printing (structured data compression). 

— The pair of functions for compact printing and parsing are inverse functions 
by construction. Since we started applying the inverse function requirement 
rigorously in the construction of the program, the size and the complexity 
of the code have been reduced considerably. We think that such a rigorous 
approach is the only way to obtain elegant solutions to involved polytypic 
problems. 

Another concept that simplified the construction and form of the program 
is arrows. In our first attempts we used monads instead of arrows. Although 
it is perfectly well possible to construct the compact printing and parsing 
functions with monads [7], the inverse function construction, and hence the 
correctness proof, is much simpler with arrows. 

— We have shown how to convert data to and from a bit stream. This is an 
example of a data conversion program, and we hope that the construction 
in this paper is reusable in solutions for other data conversion problems. 



Ibiture work 

— The current program produces compact, but not human-readable, output. A 
pretty printer for structured data has a very similar structure, and we want 
to investigate how to introduce the right abstractions to obtain a single 
program for both pretty printing and compact printing of structured data. 
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— In the future we want to investigate whether or not relations can help to 
simplify the construction even more, by specifying compact printing as a 
relation, and letting parsing be its relational converse [2,4] . 

We have presented a calculation of a polytypic program. We think that 
calculating with polytypic functions is still rather cumbersome, and we hope 
to obtain more theory, in the style of [11], to further simplify calculations 
with polytypic programs. 

— We want to construct polytypic programs for other data conversion problems 
such as encryption and database communication. 



Acknowledgements. Roland Backhouse helped with the fixed point calcula- 
tion. Joost Halenbeek implemented a polytypic data compression program using 
monads. The anonymous referees suggested many improvements for contents 
and presentation. 
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A Properties of Arrows 

The properties we need from an Arrow type constructor for the definitions of the 
generic printer and parser are most succinctly described using category theoretic 
terminology. We work in a base category C of (Haskell-) types and functions and 
a type constructor is an Arrow if we have a category A with the same types 
as objects, but with elements of a & as arrows: 

C^{H, 

A={H,{^),{^),id) 

Furthermore A must have a binary (sum) functor ((^) : A x A A), “half- 
product” functors {firsts ■. A ^ A) and there must be a functor (“^ : C ^ A) 
lifting functions to arrows. A set of laws sufficient for the proof of the correct- 
ness of the print-parse-pair is given in figure 4. We do not require the stronger 



A is a category 


id f = f = f id 

{f <^g) <^h = f <mi{g <^h) 


{<m:, id)' 
(<^,^) 


^ A 

It = a 


Ji = id 
f O g = y ^ 


(^, id) 
(^,^) 


(<^) : Ax A-^ A 
a b = Either a b 


id <0> id = id 

if ^g) ^ (/' ^ (?') = (/ ^ /') ^ (ff ^ g') 


(<^, id) 
(^,^) 


firsts : A ^ A 

firsts a = (a, c) 

1 


firsts id = id 

firsts if ^g) = firsts f firsts g 

firsts {f g)= firsts f ^ firsts g 


(firsts, id) 
(first ^,<m:) 
(first ^,<^)^ 



Fig. 4. Laws for Arrows. 



requirements that (^) should be a true categorical sum or that firsts (com- 
bined with second c) should give a categorical product as this would rule out 
many useful arrow type constructors. In fact, the proof goes through even with 
slightly weaker conditions on the arrows than those in figure 4, and thus we may 
be able to extend the class of possible arrows further. 

We denote reverse composition in A with (:^) and we often use the obvious 
variants of the laws for this operator. When translating the Arrow requirements 
back to a Haskell class we omit id as it is equal to The resulting code is 
shown in figure 5 where we also introduce some useful abbreviations. 
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class Arrow a where 

arr : : (b -> c) -> a b c 

(>>>) :: abc->acd->abd 

(Ml) :: abd->acd->a (Either b c) d 

(<+>) :: abd->ace->a (Either b c) (Either d e) 

first : : a b c -> a (b,d) (c,d) 

second : : a b c -> a (d,b) (d,c) 

— Defaults: 

f <+> g = (f >>> arr Left) I I I (g >>> arr Right) 
f I I I g = (f <+> g) >>> arr (either id id) 

second f = arr swap >>> first f »> arr swap 

first f = arr swap >>> second f »> arr swap 

— Utilities 

(<<<) : : Arrow a=>acd->abc->abd 
g «< f = f »> g 

swap : : (a,b) -> (b,a) 
swap ~(x,y) = (y,x) 

data Nat = Z I S Nat 

innNat : : Either () Nat -> Nat 
innNat = either (const Z) S 

outNat : : Nat -> Either () Nat 

outNat (Z) = Left () 

outNat (S n) = Right n 

— Either and either are predefined in Haskell 

data Either a b = Left a I Right b 

either : : (a -> c) -> (b -> c) -> Either a b -> c 
either f g (Left x) = f x 

either f g (Right x) = g x 



Fig. 5. The Arrow operations as Haskell code. 
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Abstract. Dynamic programming is an important algorithm design 
technique. It is used for solving problems whose solutions involve recur- 
sively solving subproblems that share subsubproblems. While a straight- 
forward recursive program solves common subsubproblems repeatedly 
and often takes exponential time, a dynamic programming algorithm 
solves every subsubproblem just once, saves the result, reuses it when 
the subsubproblem is encountered again, and takes polynomial time. This 
paper describes a systematic method for transforming programs written 
as straightforward recursions into programs that use dynamic program- 
ming. The method extends the original program to cache all possibly 
computed values, incrementalizes the extended program with respect to 
an input increment to use and maintain all cached results, prunes out 
cached results that are not used in the incremental computation, and uses 
the resulting incremental program to form an optimized new program. In- 
crementalization statically exploits semantics of both control structures 
and data structures and maintains as invariants equalities characterizing 
cached results. The principle underlying incrementalization is general for 
achieving drastic program speedups. Compared with previous methods 
that perform memoization or tabulation, the method based on incremen- 
talization is more powerful and systematic. It has been implemented and 
applied to numerous problems and succeeded on all of them. 



1 Introduction 

Dynamic programming is an important technique for designing efficient algo- 
rithms [2,46,14]. It is used for problems whose solutions involve recursively solv- 
ing subproblems that overlap. While a straightforward recursive program solves 
common subproblems repeatedly, a dynamic programming algorithm solves ev- 
ery subproblem just once, saves the result in a table, and reuses the result when 
the subproblem is encountered again. This can reduce the time complexity from 
exponential to polynomial. The technique is generally applicable to all problems 
whose efficient solutions involve memoizing results of subproblems [4,5]. 

Given a straightforward recursion, there are two traditional ways to achieve 
the effect of dynamic programming [14]: memoization [34] and tabulation [5]. 

Memoization uses a mechanism that is separate from the original program to 
save the result of each function call or reduction [34,19,22,35,24,43,45,39,25,18,1] . 

* This work is supported in part by NSF under Grant CCR-9711253 and ONR under 
Grant N0014-99- 1-0132. 
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The idea is to keep a separate table of solutions to subproblems, modify recur- 
sive calls to first look up in the table, and then, if the subproblem has been 
computed, use the saved result, otherwise, compute it and save the result in the 
table. This method has two advantages. First, the original recursive program 
needs virtually no change. The underlying interpretation mechanism takes care 
of the table filling and lookup. Second, only values needed by the original pro- 
gram are actually computed, which is optimal in a sense. Memoization has two 
disadvantages. First, the mechanism for table filling and lookup has an interpre- 
tive overhead. Second, no general strategy for table management is efficient for 
all problems. 

Tabulation determines what shape of table is needed to store the values of 
all possibly needed subcomputations, introduces appropriate data structures for 
the table, and computes the table entries in a bottom-up fashion so that the so- 
lution to a superproblem is computed using available solutions to subproblems 
[5,13,40,39,10,12,41,42,21,11]. This overcomes both disadvantages of memoiza- 
tion. First, table filling and lookup are compiled into the resulting program so 
no separate mechanism is needed for the execution. Second, strategies for ta- 
ble filling and lookup can be specialized to be efficient for particular problems. 
However, tabulation has two drawbacks. First, it usually requires a thorough 
understanding of the problem and a complete manual rewrite of the program 
[14]. Second, to statically ensure that all values possibly needed are computed 
and stored, a table that is larger than necessary is often used; it may also include 
solutions to subproblems not actually needed in the original computation. 

This paper presents a powerful method that statically analyzes and trans- 
forms straightforward recursive programs to efficiently cache and use the results 
of needed subproblems at appropriate program points in appropriate data struc- 
tures. The method has three steps: (1) extend the original program to cache 
all possibly computed values, (2) incrementalize the extended program, with re- 
spect to an input increment, to use and maintain all cached results, (3) prune 
out cached results that are not used in the incremental computation, and fi- 
nally use the resulting incremental program to form an optimized program. The 
method overcomes both drawbacks of tabulation. First, it consists of static pro- 
gram analyses and transformations that are general and automatable. Second, it 
stores only values that are necessary for the optimization; it also shows exactly 
when and where subproblems not in the original computation are necessarily 
included. 

Our method is based on static analyses and transformations studied pre- 
viously by others [52,9,48,6,36,20,49,41] and ourselves [33,32,31,27,32] and im- 
proves them. Yet, all three steps are simple, automatable, and efficient and have 
been implemented in a prototype system, CACHET. The system has been used 
to optimize many programs written as straightforward recursions, including all 
dynamic programming problems found in [2,46,14]. Performance measurements 
confirm drastic asymptotic speedups. 



2 Formulating the problem 

Straightforward solutions to many combinatorics and optimization problems 
can be written as simple recursions [46,14]. For example, the matrix-chain- 
multiplication problem [14, pages 302-314] computes the minimum number of 
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scalar multiplications needed by any parenthesization in multiplying a chain of 
n matrices, where matrix i has dimensions pt-i x pi. This can be computed as 
where computes the minimum number of scalar multiplications 

for multiplying matrices i through j and can be defined as: for i < j, 

m(i 7) = I ° 

'■ k) + m{k + l,j) + pi-i *pk * Pj} otherwise 

The longest-common-subsequence problem [14, pages 314-320] computes the 
length c(n, m) of the longest common subsequence of two sequences {xi, Xn) 
and (7/1, 7 / 2 , ■■■,y 7 n), where c{i,j) can be defined as: for i,j > 0, 



j 0 if i = 0 or j = 0 

c{i,j) = < c{i — 1, j — 1) + 1 if i 7^ 0 and j / 0 and Xi = yj 

I max{c{i,j — l),c{i — l,j)) otherwise 

Both of these examples are literally copied from the textbook by Cormen, Leis- 
erson, and Rivest [14]. 

These recursive functions can be written straightforwardly in the following 
first-order, call- by- value functional programming language. A program is a func- 
tion /o defined by a set of mutually recursive functions of the form 



f{vi, ...,Vn) = e 



where an expression e is given by the grammar 



c(ei , . . . , Sn ) 

p(ei, ...,e„) 

/(ei, ...,e„) 

if ei then 62 else 63 

let 7; = ei in 62 



variable 

constructor application 
primitive function application 
function application 
conditional expression 
binding expression 



We include arrays as variables and use them for indexed access such as Xt and 
Pj above. For convenience, we allow global variables to be implicit parameters 
to functions; such variables can be identified easily for our language even if they 
are given as explicit parameters. Fig. 1 gives programs for the examples above. 
Invariants about an input are not part of a program but are written explicitly to 
be used by the transformations. These examples do not use data constructors, 
but our previous papers contain a number of examples that use them [33,32,31] 
and our method handles them. 

These straightforward programs repeatedly solve common subproblems and 
take exponential time. We transform them into dynamic programming algo- 
rithms that perform efficient caching and take polynomial time. 

We use an asymptotic cost model for measuring time complexity. Assuming 
that all primitive functions take constant time, we need to consider only values 
of function applications as candidates for caching. Caching takes extra space, 
which reflects the well-known trade-off between time and space. Our primary 
goal is to improve the asymptotic running time of the program. Our secondary 
goal is to save space by caching only values useful for achieving the primary goal. 

Caching requires appropriate data structures. In Step 1, we cache all possibly 
computed results in a recursive tree following the structure of recursive calls. 
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c{i,j) where i,j > 0 

= ifi = 0 V j = 0 then 0 

else if x[i] = y\j] then — + 1 

else max{c{i,j — l),c{i—l,j)) 

j) where i < j msub{i, j, k) where i < k < j — 1 

= if i = J then 0 = let s = m{i, k) + m{k +1, j) + p[i— 1] * p[k] * p[j] in 

else msub{i, j, i) ii k +1 = j then s 

else min(s, msub(i, j, fc + 1)) 



Fig. 1. Example programs. 



Each node of the tree is a tuple that bundles recursive subtrees with the return 
value of the current call. We use <> to denote a tuple, and we use selectors 1st, 
2nd, 3rd, etc. to select the first, second, third, etc. elements of a tuple. 

In Step 2, cached values are used and maintained in efficiently computing 
function calls on slightly incremented inputs. We use an infix operation 0 to 
denote an input increment operation, also called an input change (or update) 
operation. It combines a previous input x = {xi, ...,Xn) and an increment pa- 
rameter y = {yi, ym) to form an incremented input a;' = {x {, ..., x'^^ = x (B y, 
where each x[ is some function of Xj’s and j/fc’s. An input increment operation 
we use for program optimization always has a corresponding decrement oper- 
ation prev such that for all x, y, and x', ii x' = x ® y then x = prev(x'). 
Note that y need not be used. For example, an input increment operation to 
function m in Fig. 1 could be {x'i,x'.^ = {x\,X 2 0 1) or {x'i,x'.^ = {x\ — i,X 2 ), 
and the corresponding decrement operations are {x\,X 2 ) = {x'i,X 2 — 1) and 
(xi, X 2 ) = {x'l + 1, x' 2 ), respectively. An input increment to a function that takes 
a list could be x' = cons{y,x), and the corresponding decrement operation is 
X = cdr(x'). 

In Step 3, cached values that are not used for an incremental computation are 
pruned away, yielding functions that cache, use, and maintain only useful values. 
Finally, the resulting incremental program is used to form an optimized program. 
Our optimization preserves the semantics in the sense that if the original program 
terminates with a values, the optimized program terminates with the same value. 

For a function / in an original program, / denotes the function that caches 
all possibly computed values of /, and / denotes the pruned function that caches 
only useful values. We use x to denote an un-incremented input and use r, f, 
and f to denote the return values of f{x), f{x), and f{x), respectively. For any 
function g, we use g' to denote the incremental function that computes g{x'), 
where x' = x ® y, using cached results about x such as g{x). So, g' may take 
parameter x' , as well as extra parameters each corresponding to a cached result. 
Fig. 2 summarizes the notation. 



3 Step 1: Caching all possibly computed values 

Consider a function /o defined by a set of recursive functions. Program /o may 
use global variables, such as x and y in function c(i, j). A possibly computed value 
is the value of a function call that is computed for some but not necessarily all 
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Fig. 2 . Notation. 



values of the global variables. For example, function c{i, j) computes the value 
of c{i — l,j —1) only when x[i] = y[j\. Such values occur exactly in branches of 
conditional expressions whose conditions depend on any global variable. 

We construct a program /o that caches all possibly computed values in /q. For 
example, we extend c(z, j) to always compute the value of c(z— 1, j —1) regardless 
of whether x[i] = y[j]. We first apply a simple hoisting transformation to lift 
function calls out of conditional expressions whose conditions depend on global 
variables. We then apply an extension transformation to cache all intermediate 
results, i.e., values of all function calls, in the return value. 

Hoisting transformation. Hoisting transformation Hst identifies conditional ex- 
pressions whose condition depends on any global variable and then applies the 
transformation 



Hstfif ei then 62 else 63] = let V2 = 62 in 

let V3 = 63 in 

if ei then V2 else V3 

For example, the hoisting transformation leaves m and msub unchanged and 
transforms c into 



c(i,j) = if i = 0 Vj = 0 then 0 

else let ui = c(i — 1 , j — 1 ) + 1 in 

let U2 = max(c(i,j — l),c(i— l,j)) in 
if x\i] = y\j] then ui else U2 

Hst simply lifts up the entire subexpressions in the two branches, not just the 
function calls in them. Administrative simplification performed at the end of the 
extension transformation will unwind bindings for computations that are used at 
most once in subsequent computations; thus computations other than function 
calls will be put down into the appropriate branches then. Hst is simple and 
efficient. The resulting program has essentially the same size as the original pro- 
gram, so Hst does not increase the running time of the extension transformation 
or the running times of the later incrementalization and pruning. 

If we apply the hoisting transformation on arbitrary conditional expressions, 
the resulting program may run slower, become non-terminating, or have errors 
introduced. For conditional expressions whose conditions depend on global vari- 
ables, we assume that both branches may be executed to terminate correctly 
regardless of the condition, which holds for the large class of combinatorics and 
optimization problems we handle. By limiting the hoisting transformation on 
these conditional expressions, we eliminated the last two problems. The first 
problem is discussed in Section 6. 
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Extension transformation. For each hoisted function definition f{v\, u„) = e, 
we construct a function definition 

f{vi,...,Vn) = £xt\e\ (1) 

where £rt|e], defined in [32], extends an expression e to return a nested tuple 
that contains the values of all function calls made in computing e, i.e., it ex- 
amines subexpressions of e in applicative order, introduces bindings that name 
the results of function calls, builds up tuples of these values together with the 
values of the original subexpressions, and passes these values from subcompu- 
tations to enclosing computations. The first component of a tuple corresponds 
to an original return value. Next, administrative simplifications clean up the 
resulting program. This yields a program /o that embeds values of all possibly 
computed function calls in its return value. For the hoisted programs m and c, 
the extension transformation produces the following functions: 

m{i,j) = a i = j then < 0 > 
else msubii, j, i) 

msub{i, j, k) = let vi = rn(i, k) in 

let V2 = m{k + l,j) in 

let s = Ist(ui) + lst{v2) + p[i — 1 ] * p[k] * p[j] in 
if k + 1 = j then < s,vi,V2 > 
else let v = msub(i, j, fc + 1 ) in 
< min(s, Ist(v)), vi, V2, v > 

c(i,j) — if i = 0 V j = 0 then < 0 > 
else let vi = c(i — l,j — 1 ) in 
let V2 = c(i,j — 1 ) in 
let V3 = c(i — l,j) in 

if x[i] = y[j] then < Ist(vi) + l,ui, r>2, va > 
else < max(lst(u2), lst(u3)), ui, U2, va > 



4 Step 2: Static incrementalization 

The essence of our method is to use and maintain cached values efficiently as 
a computation proceeds, i.e., we incrementalize /o with respect to an input 
increment operation 0. Precisely, we transform fo{x(By) to use the cached value 
of fo{x) rather than compute from scratch. 

An input increment operation © corresponds to a minimal update to the 
input parameters. We first describe a general method for identifying 0. We 
then give a powerful method, called static incrementalization, that constructs an 
incremental version f for each function / in the extended program and allows an 
incremental function to have multiple parameters that represent cached values. 

Input increment operation. An input increment should reflect how a compu- 
tation proceeds. In general, a function may have multiple ways of proceeding 
depending on the particular computations involved. There is no general method 
for identifying all of them or the most appropriate ones. Here we propose a 
method that can systematically identify a general class of them. The idea is to 
use a minimal input change that is in the opposite direction of change compared 




294 Yanhong A. Liu and Scott D. Stoller 



to arguments of recursive calls. Using the opposite direction of change yields 
an increment; using a minimal change allows maximum reuse, i.e., maximum 
incrementality. 

Consider a recursively defined function /q. Formulas for the possible argu- 
ments of recursive calls to fo in computing fo{x) can be determined statically. For 
example, for function c{i, j), recursive calls to c have the set of possible arguments 
Sc = {(* — l)i — 1)) (bi — 1)) (t — and for function recursive calls 

to m have the set of possible arguments Sm = {(b ^)) {k + l,j) \ i < k < j — 1}. 
The latter is simplified from Sm = {{a, c) , {c+ l,b) \ a < c < b — 1, a = i,b = j} 
where a, &, c are fresh variables that correspond to i, j, k in msub; the equalities 
are based on arguments of the recursive calls involved (in this case msub); and 
the inequalities are obtained from the inequalities on these arguments. The sim- 
plification here, as well as the manipulations below, can be done automatically 
using Omega [44]. 

Represent the arguments of recursive calls so that the differences between 
them and x are explicit. For function c. Sc is already in this form, and for func- 
tion m, Sm is rewritten as {{i,j — 1), {i + l,j)\l < I < j — i}- Then, extract 
minimal differences that cover all of these recursive calls. The partial ordering 
on differences is: a difference involving fewer parameters is smaller; a difference 
in one parameter with smaller magnitude is smaller; other differences are incom- 
parable. A set of differences covers a recursive call if the argument to the call can 
be obtained by repeated application of the given differences. So, we first compute 
the set of minimal differences and then remove from it each element that is cov- 
ered by the remaining elements. For function c, we obtain {{i,j — 1), {i — 1, j)}, 
and for function m, we obtain { (z, j — 1) , (z -|- 1 , j) } . Elements of this set represent 
decrement operations. Finally, take the opposite of each decrement operation to 
obtain an increment operation 0, introducing a parameter y if needed (e.g., for 
increments that use data constructions). For function c, we obtain (z, j 0 1) and 
(z 0 1, j), and for function m, we obtain {i,j 0 1) and (z — 1, j). Even though 
finding input increment operations is theoretically hard in general (and a decre- 
ment operation might not have an inverse, in which case our algorithm does not 
apply), it is usually straightforward. 

Typically, a function involves repeatedly solving common subproblems when 
it contains multiple recursive calls to itself. If there are multiple input increment 
operations, then any one may be used to incrementalize the program and finally 
form an optimized program; the rest may be used to further incrementalize 
the resulting optimized program, if it still involves repeatedly solving common 
subproblems. For example, for program c, either (i,j 0 1) or (z 0 1, j) will lead 
to a final optimized program, and for program m, both (z — 1, j) and (i,j + 1) 
need to be used, and they may be used in either order. 



Static incrementalization. Given a program fo arid an input increment operation 
0, incrementalization symbolically transforms fo{x') for x' = a; 0 y_to replace 
subcomputations with retrievals of their values from the value r of fo{x). This 
exploits equality reasoning, based on control and data structures of the program 
and properties of primitive operations. The resulting program ff uses r or parts 
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of r as additional arguments, called cache arguments, and satisfies: if fo{x) = f 
and fo(x') = f', then fd{x',r) = f' } 

The idea is to establish the strongest invariants, especially those about cache 
arguments, at all calls and maximize their usage. At the end, unused candidate 
cache arguments are eliminated. Reducing running time corresponds to max- 
imizing uses of invariants; reducing space corresponds to maintaining weakest 
invariants for all uses. It is important that the methods for establishing and using 
invariants are specialized so that they are automatable. The precise algorithm 
is described below. Its use is illustrated afterwards using the running examples. 

The algorithm starts with transforming fo{x') for x' = x ® y and fo{x) = f 
and first uses the decrement operation to establish an invariant about func- 
tion arguments. More precisely, it starts with transforming fo{x') with invariant 
fo{prev{x')) = r, where f is a candidate cache argument. It may use other invari- 
ants about x' if given. Invariants given or formed from the enclosing conditions 
and bindings are called context. The algorithm transforms function applications 
recursively. There are four cases at a function application f{e[, ...,e^). 

(1) If f{e[ , ...,e'„) specializes, by definition of /, under its context to a base case, 
i.e., an expression with no recursive calls, then replace it with the specialized 
expression. 

(2) Otherwise, if f{e[, ■■■, e^) equals a retrieval from a cache argument based on 
an invariant about the cache argument in its context, then replace it with 
the retrieval. 

(3) Otherwise, if an incremental version f of / has been introduced, then re- 
place f{e[, ...,e'^) with a call to f if the corresponding invariants can be 
maintained; if some invariants can not be maintained, then eliminate them 
and retransform from where f was introduced. 

(4) Otherwise, introduce an incremental version f of / and replace f{e[, ■■■, e^) 
with a call to /', as described below. 

In general, the replacement in case (1) is also done, repeatedly, if the specialized 
expression contains only recursive calls whose arguments are closer to, and will 
equal after a bounded number of such replacements, arguments for base cases or 
arguments on which retrievals can be done. Since a bounded number of invariants 
are used at a function application, as described below, the retransformation in 
case (3) can only be done a bounded number of times. So, the algorithm always 
terminates. 

To introduce an incremental version f of / at f{e[, e'^J, let Inv be the 
set of invariants about cache arguments or context information at f{e [, ..., e^). 
Those about cache arguments are of the form gi{en, eim) = Cir, where Cir 
is either a candidate cache argument in the enclosing environment or a selector 
applied to such an argument. Those about context information are of the form 
e = true, e = false, or v = e, obtained from conditions or bindings. For simplic- 
ity, we assume that all bound variables are renamed so that they are distinct. 
Introduce f to compute f{x'{, ...,x") for x'( = e'^, ■■■,x'f = e^, where x'{, ...jX'f 
are fresh variables, and deduce invariants about x'{, ..., x'f based on Inv. The de- 
duction uses equations e'l = x'{, ...,e'„ = x'f to eliminate variables in Inv and can 

^ In previous papers, we defined fd slightly differently: if fo(x) = r and fo{x(Sy) = r' , 
then fd{x,y,r) = f'. 
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be done automatically using Omega [44]. Resulting equations relating a:", x" 
are used also to duplicate other invariants deduced. If a resulting invariant still 
uses a variable other than x ”, ..., x", discard it. Finally, for each invariant about 
a cache argument, replace its right hand side with a fresh variable, which be- 
comes a candidate cache argument of /'. This yields the set of invariants now 
associated with f . Note that invariants about cache arguments have the form 
gi(e'\, ..., e"„J = Xi, where e'\,...,e"„. use only variables a;",..., a;", and is a 
fresh variable. Among the left hand sides of these invariants, identify an appli- 
cation of / whose arguments have a minimum difference from x '(, ..., a;"; if such 
an application exists, denote it /(e", ..., e"). 

To obtain a definition of /', unfold f{x”, ..., a;") and then exploit conditionals 
in f{x'{, a;") and /(e", ■■■, e") (if it exists) and components in the candidate 
cache arguments of /'. To exploit conditionals in f{x'{, ..., a;"), move function ap- 
plications inside branches of the conditionals in f{x'{, ..., a;") whenever possible, 
preserving control dependencies incurred by the order of conditional tests and 
data dependencies incurred by the bindings. This is done by repeatedly applying 
the following transformation in applicative order to the unfolded expression. For 
any t(ei, ..., Cfc) being c(ei, ..., ek),p{ei, ..., Cfc), /(ei, ..., Cfc), if ei then 62 else 63, 
or let = Cl in 62, if is if e^i then Ci2 else e^a, where z 7^ 2, 3 if t is a condi- 
tional, and z 7^ 2 or en does not depend on z> if t is a binding expression, then 
transform t{ei,...,ek) to if en then t(ei, ..., Ci-i, 6^2, e^+i, ..., Cfc) else 
Ci-i, Ci3, Ci+i, ..., Cfc). This transformation preserves the semantics. It may in- 
crease the code size, but it does not increase the running time of the result- 
ing program. To exploit the conditionals in /(e", ..., e"), introduce conditions 
from /(e", ■■■, e") in the transformed expression just obtained and put function 
applications inside both branches that follow such a condition. This is done 
by applying the following transformation in outermost-first order to the condi- 
tionals in the transformed expression just obtained. For each branch of the 
conditional that contains a function application, let e be the outermost con- 
dition in /(e",...,e") that is not implied by the context of e^; if e uses only 
variables defined in the context of Ci and takes constant time to compute, and 
the two branches in /(e",...,e") that depend on e contain different function 
applications in some component, then transform to if e then else e^. To 
exploit each component in a candidate cache argument where there is an 
invariant ..., = rj, for each branch in the transformed expression, 

specialize ..., under the context of that branch. This may yield ad- 

ditional function applications that equal various components of r^. After these 
control structures and data structures are exploited, we simplify primitive op- 
erations on and transform function applications recursively based on 

the four cases described. Finally, after we obtain a definition of /', replace the 
function application f{e[, ■■■, e(j) with a call to f with arguments e(, ..., and 
cache arguments Cir’s for the invariants used. 

The simplifications and equality reasoning needed for all the problems we 
have encountered involve only recursive data structures and Presburger arith- 
metic and can be fully automated. 



Longest common subsequence. Incrementalize c under = {i + l,j). We 

start with with cache argument r and invariant c{prev{i' , j')) = c{i' — 

l,j') = f; the invariants z',j' > 0 may also be included but do not affect any 
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transformation below, so they are omitted for convenience. This is case (4), so 
we introduce incremental version c' to compute c{i' ,f). Unfolding the definition 
of c and listing conditions to be exploited, we obtain the code below. The false 
branch of c(z', /) is duplicated with the additional condition i' — 1 = 0 V / = 0, 
which is copied from the condition in definition of c{i' — for convenience, 

three function applications bounded to V\ to U 3 are not put inside branches that 
follow condition x\i'] — y\f], since their transformations are not affected, and 
simplification at the end can take them back out. 

c{i' ,j') = if i' = 0 V = 0 then < 0 > 
else if i' — 1 = 0 V j' = 0 then 
let vi = c(i' — l,j' — 1 ) in 
let V 2 = — 1) in 

let V 3 = c(i' — l,j') in 

if x[i'] = y[j'] then < Ist(vi) + 1, V\,V 2 , V 3 > 
else < max(lst(u2), lst(v3)), vi, V 2 , V 3 > 
else let vi = c(i' — l,j' — 1 ) in 
let V 2 = — 1) in 

let V 3 = c(i' — l,j') in 

if x[i'] = y[j'] then < Ist(vi) + 1, V\,V 2 , V 3 > 
else < max(lst(v 2 ), lst(v 3 )), vi, V 2 , V 3 > 

In the second branch, z' — 1 = 0 is true, since / = 0 would imply that the 
first branch is taken. The first and third calls fall in case (1) and specialize to 
< 0 >. The second call falls in case (3) and equals a recursive call to c' with 
arguments z' , / — 1 and cache argument < 0 > since we have a corresponding 
invariant c{i' —l,f —1) = <0>. Additional simplification unwinds bindings for 
Vi and t> 3 , simplifies lst{< 0 >) + l to 1, and simplifies max(lst(z; 2 ), lst{< 0 >)) 
to lst{v 2 )- 

In the third branch, condition z' — I = 0 V / = 0 is false; c(z' —1,/) by 
definition of c equals its second branch where c(z' —l,j' —1) is bound to V 2 , and 
thus c{i' — = f implies c(z' —1,/ — 1) = 3rd{f). The first call falls in case 

(2) and equals 3rd{f). The second call falls in case (3) and equals a recursive 
call to c' with arguments i',f —1 and cache argument 3rd{f)) since we have a 
corresponding invariant c{i' — l,f — 1) = 3rd{f). The third call falls in case (2) 
and equals r. We obtain 

c'{i' ,j' , f)= if z' = 0 V j' = 0 then < 0 > 
else if z' — 1 = 0 then 

let V 2 = c'(i',j' — 1, < 0 >) in 
if x[i'] = y[j'] then < 1, < 0 >, V 2 , < 0 >> 
else < lst{v 2 ), < 0 >, U 2 , < 0 >> 
else let vi = 3rd(f) in 

let V 2 = c'{i' ,j' — 1, 3rd(f)) in 
let V 3 = f in 

if x[i'] = y[j'] then < Ist(ui) + l,vi,V 2 ,V 3 > 
else < max(lst(u 2 ), lst(v 3 )), ui, V 2 , V 3 > 

If f = c(z' — then and c' takes time and space 

linear in f , for caching and maintaining a linear list. 

Matrix-chain multiplication. Incrementalize m under {i',f) = (z,j + l). We start 
with m{i' , f), with cache argument r and invariants m{i' , j' — 1) = f and i' < f . 
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This is case (4), so we introduce incremental version rff to compute fn{i' ,f). 
Unfolding m, listing conditions, and specializing the second branch, we obtain 
the code below. 

= if i' = j' then < 0 > 

else if i' = j' — 1 then < p[i' —1] * p[i'] * p[j'], < 0 >, < 0 >> 
else msub{i' , j' , i') 

In the third branch, condition i' = / — 1 is false; fn{i',j' —1) by definition 
of m equals msub{i',j' and thus rn{i',j'—l) = f implies msub{i',j' — 

1, i') = f. The call msub{i' , j' , i') falls in case (4). We introduce msub to compute 
msub{i” , f , k”) for i” = i' ,j” = j' , k” = i' , with invariants msub{i' , j' —1, i') = f, 
m{i' ,f —1) = f, i' < f , i' ^ /, i' ^ f —1. Express these invariants as invariants 
on i” , k” using Omega, and introduce fresh variables fi for candidate cache 
arguments. We obtain 



msub(i” ,j" — 1, k”) = fi, m{i" ,j" — 1) = f 2 , 
msub{i",j" — l, i”) = ra, 
msub{k" , j" — 1, k") = r4, rn{k" ,j" — 1) = rs, 
msub{k" ,j" — 1, i") = fe, 



i"<f, i'V/'-l, k" 

k"<j", k"^j", fc'V/'-l, 



(2) 

where equation k” = i” is an additional invariant deduced, and invariants not 
on the first line are duplications of those in the first line based on k” — i" . 
Arguments of msubii" , j" — 1, k") have a minimum difference from arguments 
of 7^{i",j",k"). 

Unfolding msub{i" , j” , k”) and listing conditions to be exploited, we obtain 
the following code. The code for V\ and v -2 is duplicated for both branches that 
follow the condition fc" + 1 = j” . The code for v is duplicated for both branches 
that follow the additional condition fc" + 1 = j" — 1, which is copied from the 
condition in the definition of msub{i” , j” — 1, fc"). 



msub{i” , j" , fc") = if fc" + 1 = j" then 

let vi = m{i" , fc") in 
let V 2 = m{k" + l,i") in 

let s = Ist(ui) + lst{v 2 ) +p[i” —1] *p[fc"] *p[j”] in 
< S,Vl,V2> 

else let vi = rn{i" , fc") in 

let V 2 = rn(k'' + l,i") in 

let s = Ist(ui) + lst{v 2 ) +p[i" —1] * p[k"] *p[j"] in 
if fc" + 1 = j” - 1 then 

let V = msub(i" , j" , fc" + 1) in 

< min(s, Ist(u)), ui, V 2 , v > 
else let v = msub{i" , j" , fc" + 1) in 

< min(s, Ist(u)), ui, V 2 , v > 



The first branch is simplified away since we have invariant fc" ^ j" — 1. 

In the other branch, msub{i'' , j” — I, fc") by definition of msub has m(z", fc") 
bound to V\ and m(fc" + I, j" — I) bound to v-2, and thus msub{i” , j” — 1, fc") = zq 
implies m{i" , fc") = 2nd{fi) and m(fc" + l, j" — 1) = 3rd(ri). The first call falls in 
case (1), since we have invariant fc" = i", and equals < 0 >. The second call falls 
in case (3) and equals a recursive call to m' with arguments fc" + 1, j" and cache 
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argument 3rc?(ri) since we have a corresponding invariant m(k" + 1, j" — 1) = 
3rd(ri). 

In the branch where fc" + 1 = f — 1 is true, the call to msub falls in case (1) 
and equals 



let = m{i'' ,j” — 1) in let V2 = in 

let s = Ist(vi) + lst{v2) + p[i" —1] *p[k" + 1] *p[j"] in < s, vi, V2 > 

which then equals <lst(r 2 ) +p[i" —1] *p[k" + 1] * p[j"\, <0>> because the 

first call equals and the second call equals < 0 >. 

In the last branch, the call to msub falls in case (3). However, the arguments 
of this call do not satisfy the invariant corresponding to k” = i" and those on the 
third and fourth lines in (2). So we delete these invariants and retransform msub. 
Everything remains the same except that m{i” , k") does not fall in case (1) any 
more; it falls in case (2) and equals 2nd{fi). We replace this call to msub by a 
recursive call to msub with arguments i", j", fc"+l and cache arguments 4th{fi), 
T 2 , T 3 since we have corresponding invariants msub{i” , j” —1, k” + 1) = 4th(?q), 
- 1) = r~2, - 1, i”) = fs. 

We e limina te unused c andida te cache argument f^, and we replace the orig- 
inal call msub{i' , j' , i') by msub{i' , j' , i', r, f). We obtain 

rn'{i' , j' , f) = if i' = j' then < 0 > 

else if i' = j' — 1 then < p[i' —1] * p[i'] * p[j'], < 0 >, < 0 >> 
else msub [i' , j' ,i' ,f,f) 

msub' (i” ,j”, k ” , n , f 2 ) = 

let vi = 2 nd{fi) in 

let V2 = m'{k" + 1 , j" , 3 rd{fi)) in 

let s = lst{vi) + lst{v2) +p[i" —1] * p[k"] * p[j"] in 

if k" + 1 = j" -1 then 

let V = <lst(v2) + p[i" —1] *p[k'' + 1] *p[j"], f 2 , < 0 >> in 

< min(s, Ist(v)), vi, V2, v > 

else let v = msub {i" ,j" , k” + 1, 4 th{ri), f2) in 

< min(s, Ist(v)), vi, V2, v > 

If f — m{i',f —1), then rrt{i',j',f) = m{i',j'), and m' is an exponential- 
factor faster. However, m still takes exponential time due to repeated calls to 
nf] incrementalizing again under = {i we obtain a linear-time 

incremental program. 



5 Step 3: Pruning unnecessary values 

Among the components maintained by fo{x', f), the first one is the return value 
of fo{x'). Components in f that are not useful for computing this value need 
not be cached and maintained. We prune the programs fo and fd and obtain 
a program fo that caches only the useful values and a program ff that uses 
and maintains only the useful values. Finally, we form an optimized program 
that computes fo by using the base cases in fo and by repeatedly using the 
incremental version ff . 
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Pruning. Pruning requires a dependence analysis that can precisely describe 
substructures of recursive trees [32] . We use an analysis method based on regular 
tree grammars [28]. We have implemented a simplified version that uses set 
constraints to efficiently produce precise analysis results. Pruning can save space, 
as well as time, and reduce code size. 

For example, in program c' , only the third component of f is useful. Pruning 
the second and fourth components of c and c', which moves the third up to the 
second, and doing a few simplifications, which transforms Ist(c) back to c and 
unwinds bindings for Vi and V3, we obtain c and c' below: 

c(i,j) = ifi = 0 Vj = 0 then < 0 > 
else let V2 = c{i,j — 1 ) in 

if x[i] = y\j] then < c{i — l,i — 1 )) + 1 , V2 > 
else < max(lst(u2), c(i — l,i))),U2 > 

f)= if i' = 0 V = 0 then < 0 > 
else if i' — 1 = 0 then 

let V2 = c'{i' ,j' — 1 , < 0 >) in 
if x[i'] = y[j'] then < 1,V2 > 
else < lst{v2),V2 > 
else let V2 = — l,2nd{r)) in 

if x[i'] = y[j'] then < lst{ 2 nd{f)) + 1,V2 > 
else < max(lst(u2), Ist(f)), U2 > 

Pruning leaves programs m and nt unchanged. We obtain the same programs 
m and m', respectively. 



Forming optimized programs. We redefine functions /o and /o and use function 
/o: 



fo{x) = lst{fo{x)) 

fo{x) = if basemond(x) then base^val(x) else let r = fo{prev{x)) in fd{x,r) 

where base.cond is the base-case condition, and base_val is the corresponding 
value, both copied from the definition of /q. In general, there may be multiple 
base cases, and we just list them all. 

For examples c and m, we obtain directly 

c{i,j) = lst{c{i,j)) 

c{i,j) = ifi = 0 V j = 0 then <0> else let f = c(i — 1, j) in c'(i, j, f) 
m{i,j) = lst{m{i,j)) 

fh{i,j) = if i = j then < 0 > else let r = fh{i,j — 1 ) in fh'{i,j, r) 

where c' and m! are as obtained above. For c(n, m), while the original program 
takes 0(2”+'") time, the optimized program takes 0(n * m) time. For m(l,n), 
while the original program takes 0(n * 3”) time, the optimized program takes 
0(n^ * 2”) time. Incrementalizing the optimized program again under the incre- 
ment to the other parameter, we obtain an optimized program that takes O(n^) 
time. 
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6 Summary and discussion 

Our method for dynamic programming is completely static, fully automatable, 
and efficient. In particular, it is based on a general approach for program optimi- 
zation — incrementalization. Although our static incrementalization allows only 
one incremental version for each original function, it is still powerful enough to 
incrementalize all examples in [33,32,31], including various list manipulations, 
matrix computations, attribute evaluation, and graph problems. We believe that 
our method can perform dynamic programming for all problems whose solutions 
involve recursively solving subproblems that overlap, but a formal justification 
awaits more rigorous study. 

In our method, only values that are necessary for the incrementalization 
are stored, in appropriate data structures. For the longest-common-subsequence 
example, only a linear list is needed, whereas in standard textbooks, a quadratic 
two-dimensional array is used, and an additional optimization is needed to reduce 
it to a one-dimensional array [14]. For the matrix-chain-multiplication example, 
our optimized program uses a list of lists that forms a triangle shape, rather 
than a two-dimensional array of square shape. It’s nontrivial to see that recursive 
data structures gives the same asymptotic speedup as arrays for these examples. 
There are dynamic programming problems, e.g., 0-1 knapsack, for which the use 
of array, with constant-time access of elements, helps achieve desired asymptotic 
speedups. Such situations become evident when doing incrementalization and 
can be taken care of easily. This will be described in a future paper. Although 
we present the optimizations for a functional language, the underlying principle 
is general and has been applied to programs that use loops and arrays [27,30]. 

Some values computed in a hoisted program might not be computed by 
the original program and are therefore called auxiliary information [31]. Both 
incrementalization and pruning produce programs that are as least as fast as 
the given program, but caching auxiliary information may result in a slower 
program on certain inputs. We can determine statically whether such information 
is cached in the final program. If so, we can use time and space analysis [29] to 
determine whether it is worthwhile to use and maintain such information. 

Many dynamic programming algorithms can be further improved by exploit- 
ing additional properties of the given problems [7], e.g., greedy properties. Our 
method is not specially aimed at discovering such properties. Nevertheless, it can 
maintain such properties once they are added. For example, for the paragraph- 
formatting problem [14,17], we can derive a quadratic-time algorithm that uses 
dynamic programming; if the original program has a simple extra conditional 
that follows from a greedy property, our derived dynamic programming pro- 
gram uses it as well and takes linear time with a factor of line width. How to 
systematically discover and use these additional properties is a subject for future 
study. 



7 Implementation and experimentation results 

All three steps have been implemented in a prototype system, CACHET. The 
incrementalization step as currently implemented is semi-automatic [26] and is 
being automated. The implementation uses the Synthesizer Generator [47]. 
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Fig. 3 summarizes some of the examples derived (most of them semi-automa- 
tically and some automatically) and compares their asymptotic running times. ^ 
The second column shows whether more than one cache argument is needed in an 
incremental program. The third column shows whether the incremental program 
computes values not necessarily computed by the original program. Paragraph 
formatting 2 [17] includes a conditional that reflects a greedy property. The “a” 
in the third column for the last two examples shows that cached values are stored 
in arrays. Performance measurements confirmed drastic speedups. 



Examples 
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running time 
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0{nn 
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0(n*3") 


0{nn 


paragraph formatting [14] 


V 




0(n*2") 


0{nn 
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V 




0(n*2") 
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0-1 knapsack [14] 








0(n * weight) 


context-free-grammar parsing [2] 


\/ 




0(n*(2*size + l)”) 


0{rC' * size) 



Fig. 3. Summary of Examples. 



8 Related work and conclusion 

Dynamic programming was first formulated by Bellman [4] and has been studied 
extensively since [51]. Bird [5], de Moor [16], and others have studied it in the con- 
text of program transformation. While some works address the derivation of re- 
cursive equations, notably the work by Smith [50], our work addresses the deriva- 
tion of efficient programs that use tabulation. Previous methods for this problem 
either apply to specific subclasses of problems [13,40,10,12,42,21] or give general 
frameworks and strategies rather than precise algorithms [52,9,5,48,6,3,39,49,8], 
[16,41,15]. Our work is based on the general principle of incrementalization 
[38,31] and consists of precise program analyses and transformations. 

In particular, tupling [40,41] aims to compute multiple values together in an 
efficient way. It is improved to be automatic on subclasses of problems [10] and 
to work on more general forms [12]. It is also extended to store lists of values [42], 
but such lists are generated in a fixed way, which is not the most appropriate 
way for many programs. A special form of tupling can eliminate multiple data 
traversals for many functions [21]. A method specialized for introducing arrays 
was proposed for tabulation [11], but as our method has shown, array is not 

^ Matrix-chain multiplication, optimal binary search trees, optimal polygon triangu- 
lation, and other problems not in Fig. 3 have similar control structures for recursive 
calls. Yet, it is nontrivial for an automated system to handle all of them uniformly. 
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essential for the speedup of many programs; their arrays are complicated to 
derive and often consume more space than necessary. 

Compared with our previous work for incrementalizing functional programs 
[33,32,31], this work contains drastic improvements. First, our previous work ad- 
dress the systematic derivation of an incremental program f given both program 
/ and operation 0. This paper describes a systematic method for identifying an 
appropriate operation 0 given a function / and using the derived incremen- 
tal program f to form an optimized version of /. Second, since it is difficult 
to introduce appropriate cache arguments, our previous method allows at most 
one cache argument for each incremental function. This paper allows multiple 
cache arguments, without which many programs could not be incrementalized, 
e.g., the matrix-chain-multiplication program. Third, our previous method in- 
troduces incremental functions using an on-line strategy, i.e., on-the-fly during 
the transformation, so it may attempt to introduce an unbounded number of 
new functions and thus not terminate. The algorithm in this paper statically 
determines one incremental function for each one in the original program, i.e., it 
is monovariant; even though it is theoretically more limited, it is simpler, always 
terminates, and is able to incrementalize all previous examples. Finally, based 
on the idea of cache-and-prune that was proposed earlier [32], the method in 
this paper uses hoisting to extend the set of intermediate results [32] to include 
a kind of auxiliary information [31] that is sufficient for dynamic programming. 
This method is simpler than our previous general method for discovering aux- 
iliary information [31]. Additionally, we now use a more precise and efficient 
dependence analysis for pruning [28] . 

Finite differencing [38,37] is based on the same underlying principle as incre- 
mental computation. Paige has explicitly asked whether finite differencing can 
be generalized to handle dynamic programming [36] ; it is clear that he perceived 
an important connection. However, finite differencing has been formulated for 
set-based languages, while straightforward solutions to dynamic programming 
problems are usually formulated as recursive functions, so it was difficult to 
actually establish the connection. 

Overall, being able to incrementalize complicated recursion in a systematic 
way is a more drastic improvement complementing previous methods for incre- 
mentalizing loops [38,27]. Our new method based on static incrementalization is 
general and fully automatable. Based on our existing implementation, we believe 
that a complete system will perform incrementalization efficiently. 
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